Would PDF files show as duplicate content?

richie32
Posts: 21
Joined: 22 Nov 09
Trust:

10 Apr 10 12:22 pm

Would PDF files show as duplicate content?

Hi all,

If I was to put a pdf file in a zipped folder (for later bonus email) in the folder with all my website files with hostgator, would robots crawl it etc and would the content in it show as duplicate to other content on my site?

A hyperlink to the file can be added to emails. I have tested it in my usual "learn by doing" mentality and a right-click, save to target type thing works well.

Very interested to know as I was hoping the file would just sit there given that it is totally different format to the other files but images are different format too and I am unsure about them as well.

Kerry

Last edited by michellerana on 16 Apr 10 6:53 am, edited 1 time in total.
Reason: improve title to describe the post better

wollowra
Posts: 869
Joined: 14 Mar 08
Trust:

11 Apr 10 12:58 am

Hi Kerry,
Google does crawl PDF docs.
That being said, I would not worry too much about the duplicate content side of things as Google will just not index the PDF.
If you have an html page with content on it.. get that indexed first and then if you put a PDF on the site for download, you can and it will not be indexed because it MAY be viewed as duplicate but you will not get penalized for it.
If you are really concerned about it, then just do a no follow on the PDF.

Google can not read text in images or see if images are duplicate. The only wat they can tell is if you name the images the same name as each other in the ALT text.

Regards
Troy

Enjoy the little things, for one day you may look back and realize
they were the big things.

-- Robert Brault

burkhardt5
Posts: 216
Joined: 26 Jun 09
Trust:

18 Jul 10 2:31 pm

Thats good to hear. I wanted to use the exact same text at the very bottom of all my pages but was worried about dup. content so I made it into a jpeg pic.
Never give up

Paul J. Burkhardt

luckylook3
Posts: 28
Joined: 07 Mar 11
Trust:

07 Mar 11 7:51 am

if you link to your pdf site google will find it.

i dont think it will count as duplicate content but you can use the conical tag just to be on the safe side

my quotes sites:
http://www.birthday-quotes.info/
http://www.quotes-to-live-by.info/
http://www.quotes-about-moving-on.info/

fastflipwebservices
Posts: 20
Joined: 22 Apr 11
Trust:

25 Apr 11 9:51 am

There's a way to tell your robots.txt or robots meta to not craw that particular page. If you have those files that is. I usually use wordpress, so I can decide what the spiders will crawl.

cecille.l
Posts: 6369
Joined: 25 Feb 11
Trust:

02 May 11 12:58 am

Hi Kerry,

If you have the same content on an HTML page and PDF file and both are up on your site, Google will count it as duplicate content but will choose the HTML over the PDF. You need to tweak the robots.txt so that the PDF files do not get indexed. You can find additional information at http://www.seroundtable.com/archives/021584.html

Hope that helps. Have a good day!

Cecille

Building affiliate marketing websites is a breeze: https://www.affilorama.com/affilojetpack
Like us on Facebook: https://www.facebook.com/affilorama

rankwarrior
Posts: 24
Joined: 04 May 11
Trust:

04 May 11 3:08 pm

agree with cecille

This topic was started on Apr 10, 2010 and has been closed due to inactivity. If you want to discuss this topic further, please create a new forum topic.

Topic locked

Would PDF files show as duplicate content?

Want to learn how to make $10,000 per month as an affiliate?

Most Popular