Would PDF files show as duplicate content?
-
Richie32 - Posts: 37
- Joined: 22 Nov 09
- Location: New Zealand
10 Apr 10 12:22 pm
Would PDF files show as duplicate content?
Hi all,
If I was to put a pdf file in a zipped folder (for later bonus email) in the folder with all my website files with hostgator, would robots crawl it etc and would the content in it show as duplicate to other content on my site?
A hyperlink to the file can be added to emails. I have tested it in my usual "learn by doing" mentality and a right-click, save to target type thing works well.
Very interested to know as I was hoping the file would just sit there given that it is totally different format to the other files but images are different format too and I am unsure about them as well.
Kerry
If I was to put a pdf file in a zipped folder (for later bonus email) in the folder with all my website files with hostgator, would robots crawl it etc and would the content in it show as duplicate to other content on my site?
A hyperlink to the file can be added to emails. I have tested it in my usual "learn by doing" mentality and a right-click, save to target type thing works well.
Very interested to know as I was hoping the file would just sit there given that it is totally different format to the other files but images are different format too and I am unsure about them as well.
Kerry
Last edited by michellerana on 16 Apr 10 6:53 am, edited 1 time in total.
Reason: improve title to describe the post better
Reason: improve title to describe the post better
-
wollowra - Posts: 1268
- Joined: 14 Mar 08
- Location: Australia
11 Apr 10 12:58 am
Hi Kerry,
Google does crawl PDF docs.
That being said, I would not worry too much about the duplicate content side of things as Google will just not index the PDF.
If you have an html page with content on it.. get that indexed first and then if you put a PDF on the site for download, you can and it will not be indexed because it MAY be viewed as duplicate but you will not get penalized for it.
If you are really concerned about it, then just do a no follow on the PDF.
Google can not read text in images or see if images are duplicate. The only wat they can tell is if you name the images the same name as each other in the ALT text.
Regards
Troy
Google does crawl PDF docs.
That being said, I would not worry too much about the duplicate content side of things as Google will just not index the PDF.
If you have an html page with content on it.. get that indexed first and then if you put a PDF on the site for download, you can and it will not be indexed because it MAY be viewed as duplicate but you will not get penalized for it.
If you are really concerned about it, then just do a no follow on the PDF.
Google can not read text in images or see if images are duplicate. The only wat they can tell is if you name the images the same name as each other in the ALT text.
Regards
Troy
Enjoy the little things, for one day you may look back and realize
they were the big things.
-- Robert Brault
-
burkhardt5 - Posts: 100
- Joined: 26 Jun 09
- Location: United States
18 Jul 10 2:31 pm
Thats good to hear. I wanted to use the exact same text at the very bottom of all my pages but was worried about dup. content so I made it into a jpeg pic.
Never give up
Never give up
-
luckylook3
- Posts: 28
- Joined: 07 Mar 11
- Location: United States
07 Mar 11 7:51 am
if you link to your pdf site google will find it.
i dont think it will count as duplicate content but you can use the conical tag just to be on the safe side
i dont think it will count as duplicate content but you can use the conical tag just to be on the safe side
-
fastflipwebservices
- Posts: 22
- Joined: 22 Apr 11
- Location: Germany
25 Apr 11 9:51 am
There's a way to tell your robots.txt or robots meta to not craw that particular page. If you have those files that is. I usually use wordpress, so I can decide what the spiders will crawl.
-
Cecille L - Posts: 1473
- Joined: 25 Feb 11
- Location: Philippines
02 May 11 12:58 am
Hi Kerry,
If you have the same content on an HTML page and PDF file and both are up on your site, Google will count it as duplicate content but will choose the HTML over the PDF. You need to tweak the robots.txt so that the PDF files do not get indexed. You can find additional information at http://www.seroundtable.com/archives/021584.html
Hope that helps. Have a good day!
If you have the same content on an HTML page and PDF file and both are up on your site, Google will count it as duplicate content but will choose the HTML over the PDF. You need to tweak the robots.txt so that the PDF files do not get indexed. You can find additional information at http://www.seroundtable.com/archives/021584.html
Hope that helps. Have a good day!
Cecille
http://www.affilorama.com/affiloblueprint
Build a Successful Website in 12 Weeks
Add us on Google Plus: http://www.affilorama.com/googleplus
http://www.affilorama.com/affiloblueprint
Build a Successful Website in 12 Weeks
Add us on Google Plus: http://www.affilorama.com/googleplus
-
rankwarrior
- Posts: 32
- Joined: 04 May 11
- Location: Great Britain
04 May 11 3:08 pm
agree with cecille
