Search Engine Cloaking FAQs: an Interview With Dan Kramer, Creator of Kloakit
I recently asked Dan Kramer of KloakIt if I could interview him about some common cloaking questions I get asked, and he said sure.
How does cloaking work?
It is easiest to explain if you first understand exactly what cloaking is. Web page cloaking is the act of showing different content to different visitors based on some criterion, such as whether they are a search engine spider, or whether they are located in a particular country.
A cloaking program/script will look at a number of available pieces of information to determine the identity of a visitor: the IP address, the User-Agent string of the browser, the referring URL, all of which are contained in the HTTP headers of the request for the web page. The script will make a decision based on this information and serve the appropriate content to the visitor.
For SEO purposes, cloaking is done to serve optimized versions of web pages to search engine spiders and hide that optimized version from human visitors.
What are the risks associated with cloaking? What types of sites should consider cloaking?
Many search engines discourage the practice of cloaking. They threaten to penalize or ban those caught using cloaking techniques, so it is wise to plan a cloaking campaign carefully. I tell webmasters that if they are going to cloak, they should set up separate domains from their primary website and host the cloaked pages on those domains. That way, if their cloaked pages are penalized or banned, it will not affect their primary website.
The types of sites that successfully cloak fall into a couple of categories. First, you have those who are targeting a broad range of "long tail" keywords, typically affiliate marketers and so on. They can use various cloaking software packages to easily create thousands of optimized pages which can rank well. Here, quantity is the key.
Next, you have those with websites that are difficult for search engines to index. Some people with Flash-based websites want to present search engine spiders with text versions of their sites that can be indexed, while still delivering the Flash version to human visitors to the same URL.
What is the difference between IP delivery and cloaking?
IP delivery is a type of cloaking. I mentioned above that there are several criteria by which a cloaking script judges the identity of a visitor. One of the most important is the IP address of the visitor.
Every computer on the internet is identified by its IP address. Lists are kept of the IP addresses of the various search engine spiders. When a cloaking script has a visitor, it looks at their IP address and compares it against its list of search engine spider IP addresses. If a match is found, it delivers up the optimized version of the web page. If no match is found, it delivers up the "landing page", which is meant for human eyes. Because the IP address is used to make the decision, it's called "IP delivery".
IP delivery is considered the best method of cloaking because of the difficulty involved in faking an IP address. There are other methods of cloaking, such as by User-Agent, which are not as secure. With User-Agent cloaking, the User-Agent string in the HTTP headers is compared against a list of search engine spider User-Agents. An example of a search engine spider User-Agent is
The problem with User-Agent cloaking is that it is very easy to fake a User-Agent, so your competitor could easily decloak one of your pages by "spoofing" the User-Agent of his browser to make it match that of a search engine spider.
How hard is it to keep up with new IP addresses? Where can people look to find new IP addresses?
It's a chore the average webmaster probably wouldn't relish. There are always new IP addresses to add (the best cloaking software will do this automatically), and it is a never-ending task. First, you have to set up a network of bot-traps that notify you whenever a search engine spider visits one of your web pages. You can have a CGI script that does this for you, and possibly check the IP address against already known search engine spiders. Then, you can take the list of suspected spiders generated that way and do some manual checks to make sure the IP addresses are actually registered to search engine companies. Also, you have to keep an eye out for new search engines... you would not believe how many new startup search engines there are every month.
Instead of doing it all yourself, you can get IP addresses from some resources that can be found on the web. I manage a free public list of search engine spider IP addresses. There
are also some commercial resources available (no affiliation with me). In addition to those lists, you can find breaking info at the Search Engine Spider Identification Forum at WebmasterWorld.
Is cloaking ethical? Or as it relates to SEO is ethics typically a self serving word?
Some would say that cloaking is completely ethical, others disagree. Personally, my opinion is that if you own your website, you have the right to put whatever you like on it, as long as it is legal. You have the right to choose which content you display to any visitor. Cloaking for SEO purposes is done to increase the relevancy of search engine queries... who wants visitors that aren't interested in your site?
On the other hand, as you point out, the ethics of some SEOs are self serving. I do not approve of those who "page-jack" by stealing others content and cloaking it. Also, if you are trying to get rankings for one topic, and sending people to a completely unrelated web page, that is wrong in my book. Don't send kids looking for Disney characters to your porn site.
I have seen many garbage subdomains owning top 10 rankings for 10s to 100s of thousands of phrases in Google recently. Do you think this will last very long?
No, I don't. I believe this is due to an easily exploitable hole in Google's algorithm that really isn't related to cloaking, although I think some of these guys are using cloaking techniques as a traffic management tool. Google is already cleaning up a lot of those SERPs and will soon have it under control. The subdomain loophole will be closed soon.
How long does it usually take each of the engines to detect a site that is cloaking?
That's a question that isn't easily answered. The best answer is "it depends". I've had sites that have never been detected and are still going strong after five or six years. Others are banned after a few weeks. I think you will be banned quickly if you have a competitor who believes you might be cloaking and submits a spam report. Also, if you are creating a massive number of cloaked pages in a short period of time, I think this is a flag for search engines to investigate. Same goes for incoming links... try to get them in a "natural" looking progression.
What are the best ways to get a cloaked site deeply indexed quickly?
My first tip would be to have the pages located on a domain that is already indexed -- the older the better. Second, make sure the internal linking structure is adequate to the task of spidering all of the pages. Third, make sure incoming links from outside the domain link to both the index (home) cloaked page and to other "deep" cloaked pages.
As algorithms move more toward links and then perhaps more toward the social elements of the web do you see any social techniques replacing the effect of cloaking?
Cloaking is all about "on-page" optimizing. As links become more important to cracking the algorithms, the on-page factors decline in importance. The "new web" is focused on the social aspects of the web, with people critiquing others content, linking out, posting their comments, blogging, etc. The social web is all about links, and as links become more of a factor in rankings, the social aspects of the web become more important.
However, while what people say about your website will always be important, what your website actually says (the text indexed from your site) cannot be ignored. The on-page factors in rankings will never go away. I cannot envision "social techniques" (I guess we are talking about spamming Slashdot or Digg?) replacing on-page optimization, but it makes a hell of a supplement... the truly sophisticated spammer will make use of all the tools in his toolbox.
How does cloaking relate to poker? And can you cheat at online poker, or are you just head and shoulders above the rest of the SEO field?
Well, poker is a game of deception. As a pioneer in the cloaking field, I suppose I have picked up a knack for the art of lying through my teeth. In the first SEO Poker Tournament, everybody kept folding to my bluffs. While it is quite tempting to run poker bots and cheat, I find there is no need with my excellent poker skills. Having said all that, I quietly await the next tournament, where I'm sure I'll be soundly thrashed in the first few minutes ;)
How long do you think it will be before search engines can tell the difference between real page content and garbled markov chain driven content? Do you think it will be computationally worthwhile for them to look at that? Or can they leverage link authority and usage data to negate needing to look directly at readability as a datapoint?
I think they can tell now, if they want to devote the resources to it.
However, this type of processing is time/CPU intensive and I'm not sure they want to do it on a massive scale. I'm not going to blueprint the techniques they should use to pick which pages to analyze, but they will have to make some choices. Using link data to weed out pages they don't need to analyze would be nice, but in this age of rampant link selling, link authority may not be as reliable an indicator as they would like. Usage data may not be effective because in order to get it, the page has to be indexed so they can track the clicks, defeating the purpose of spam elimination. There best bet would be to look at creation patterns... look to see which domains are creating content and gaining links at an unreasonable rate.
What is the most amount of money you have ever made from ranking for a misspelled word? And if you are bolder than I am, what word did you spell wrong so profitably?
I made a lot of money from ranking for the word "incorparating". This was waaay back in the day. I probably made (gross) in the high five figures a year for several years from that word. Unfortunately, either people became better spellers or search engines got smarter, because the traffic began declining for the word about four or five years ago.