03 Jan 11 1:50 am
Centered wrote:Based on your post above, we can conclude that Google and Yahoo search engines now must based their algorithm on Levenstein distance, mustn't they?
You won't find any official statement from Google mentioning the exact algorithm they use simply because revealing this would undermine their process of detecting duplicate and plagiarized contents.
Google probably uses a custom or proprietary algorithm for comparing articles if we are to base this from
Google's patent application on near duplicate content (Granted December 2009). Source:
Google Patent Granted on Duplicate Content Detection in a Web Crawler System.
Centered wrote:...what about the articles that have been posted for some time that based their uniqueness on Affilorama's compare tools whose custom solution quite different to most other compare tools? Aren't they now endangered by duplicate content penalty by the search engines?
First, duplicate content penalty is still subject for debate. Majority of webmasters (and even Google itself) says that it is a myth. Google, for sure, does not want to display same or exact content fill up the first 10 or 15 of its search results. Hence, a website with a very similar content will likely be placed in the lower part of the results pages but this does not necessarily mean the site is being 'penalized' for having a similar or duplicate content.
Second, we have not yet encountered any website that suffered from low search ranking just by using our Article Compare tool, although we get inquiries from time to time about the varying results between our tool and other commercially available comparison tools. Again, our algorithm is still subject for review at this time.