Looking for some creative inspiration for your promotions? Find and suggest ideas for marketing and building your affiliate campaigns.

Would PDF files show as duplicate content?

PremiumMember
richie32
Posts: 37
Joined: 22 Nov 09
Trust:

Would PDF files show as duplicate content?

Hi all,

If I was to put a pdf file in a zipped folder (for later bonus email) in the folder with all my website files with hostgator, would robots crawl it etc and would the content in it show as duplicate to other content on my site?

A hyperlink to the file can be added to emails. I have tested it in my usual "learn by doing" mentality and a right-click, save to target type thing works well.

Very interested to know as I was hoping the file would just sit there given that it is totally different format to the other files but images are different format too and I am unsure about them as well.

Kerry
  • 1
Last edited by michellerana on 16 Apr 10 6:53 am, edited 1 time in total.
Reason: improve title to describe the post better
 

Moderator
wollowra
Posts: 1283
Joined: 14 Mar 08
Trust:
Hi Kerry,
Google does crawl PDF docs.
That being said, I would not worry too much about the duplicate content side of things as Google will just not index the PDF.
If you have an html page with content on it.. get that indexed first and then if you put a PDF on the site for download, you can and it will not be indexed because it MAY be viewed as duplicate but you will not get penalized for it.
If you are really concerned about it, then just do a no follow on the PDF.

Google can not read text in images or see if images are duplicate. The only wat they can tell is if you name the images the same name as each other in the ALT text.

Regards
Troy
  • 1
Enjoy the little things, for one day you may look back and realize
they were the big things.

-- Robert Brault
 
burkhardt5
Posts: 223
Joined: 26 Jun 09
Trust:
Thats good to hear. I wanted to use the exact same text at the very bottom of all my pages but was worried about dup. content so I made it into a jpeg pic.
Never give up
  • 1
Paul J. Burkhardt
 
luckylook3
Posts: 28
Joined: 07 Mar 11
Trust:
if you link to your pdf site google will find it.

i dont think it will count as duplicate content but you can use the conical tag just to be on the safe side
  • 1
fastflipwebservices
Posts: 22
Joined: 22 Apr 11
Trust:
There's a way to tell your robots.txt or robots meta to not craw that particular page. If you have those files that is. I usually use wordpress, so I can decide what the spiders will crawl.
  • 1
Site Admin
cecille.l
Posts: 7002
Joined: 25 Feb 11
Trust:
Hi Kerry,

If you have the same content on an HTML page and PDF file and both are up on your site, Google will count it as duplicate content but will choose the HTML over the PDF. You need to tweak the robots.txt so that the PDF files do not get indexed. You can find additional information at http://www.seroundtable.com/archives/021584.html

Hope that helps. Have a good day!
  • 1
Cecille


Step by step guide to "Penguin-proofed" sites : www.affilorama.com/affiloblueprint


Add us on Google Plus: http://www.affilorama.com/googleplus
 
rankwarrior
Posts: 32
Joined: 04 May 11
Trust:
agree with cecille
  • 1

This topic was started on Apr 10, 2010 and has been closed due to inactivity. If you want to discuss this topic further, please create a new forum topic.

cron