When you run into desperation, and can't find a particular page on a small site, there is one page you can refer to - the sitemap. In this lesson, we're going to take a look at Sitemaps - what they do, how they're useful and how to create one.
It's not just lost and frustrated web users who love sitemaps. The search engines love them too. A sitemap is an easy way to ensure that the search engines are able to see every page on your website, giving you a better chance of having all your pages indexed. There are two different types of sitemap:
This type of sitemap is designed for humans, although the search engines find them handy as well. These are just normal HTML pages full of links to all pages on your website. This page is usually accessible through a small link in the footer of your page (there's no need to make it a giant, obvious link in your navigation). There's nothing special you need to do to create this page — just create it like any other HTML page.
The second type of sitemap you should be familiar with is commonly called a “Google sitemap”, even though it is used by Google, Yahoo and MSN. It looks kind of like this:
It's not really human-friendly (unless you like reading XML) but it contains a lot more information for the search engines than the traditional sitemap. Such as:
Change frequency: (optional)
As with most information in the sitemap, what you say here is simply taken as a “hint” by the search engines. Even though you say that your page changes hourly, you might not find the spiders crawling your site every hour. Similarly, you might find them wandering through your “yearly” page more regularly than you suggested.
Which is the more important page on your site: Your home page or your “contact us” form? Sitemaps allow you to give a weight or priority to your pages, relative to each other. The value ranges from 0.0 for your least important page, up to 1.0 for your most important page, with the default value being 0.5.
Because you're talking about pages that are relative to each other, you can't go in and say that all your pages are the most important page: This makes them all look the same to the search engines and completely defeats the purpose. The idea is that you should give higher priority to pages like your index page, your review pages, and other pages that you really want to see in the search engines. You should give lower priority to pages that don't really need to be in the search engines.
This information might help the search engines choose between different pages on the same site. It won't affect your search engine rankings, but at least you won't find your “contact us” page starring in the search engine listings while your articles are left on the kerb.
Last modified: (optional)
This tells the search engines when your page was last changed. The idea being that if they see from this that your page hasn't changed, they might not bother to spider it again, thus saving a bit of time and energy. In reality the search engines have other ways of determining whether a page has been updated, so it's not really necessary to include this one, particularly if you're updating your sitemap by hand. This could get very tedious!
It is possible to write your sitemap by hand. The language is quite simple, but it's a long and arduous process if you've got a lot of pages!
A better idea is to use an xml sitemap generation tool.
If you're using WordPress, Drupal, phpBB, or many other content management systems, there are plugins and modules that you can incorporate into your CMS to automatically generate and maintain your sitemap. There's also a plugin for Dreamweaver which you can run to generate your sitemap.
The next thing you need to do is upload the sitemap to your webserver (just like you upload any other file) and then tell the search engines where they can find it.
If you sign up for Google's Webmaster Tools at www.google.com/webmasters/tools, once you've added and verified your site you'll be able to tell Google where to find your sitemap as well. Note that for this method you'll need to sign up with a Google account and authenticate your site by uploading a key file to your website or adding a meta-tag to your page. Once you've done it once you shouldn't have to do it again (unless you delete the file or the meta-tag!).
Not quite as friendly as the Google method. Try entering this into your browser (changing it to the address of your own sitemap, of course).
The method for Yahoo is similar to the one for Google: You need to add your site to Yahoo's Site Explorer service, and from there you can add a sitemap. You'll need to sign up for a Yahoo account (if you don't already have one) and authenticate your site by uploading a key file or adding some meta data to your page.
If all this seems like a lot of jumping through hoops, there's an all-in-one solution: adding one line of code to your robots.txt file.
Each time a web crawler visits your site the first thing it looks for is a robots.txt file. This is the place where you tell the crawlers and other “robots” which pages you don't want them to access or index. If they're polite crawlers (and most of the search engine ones are) they'll respect the wishes of the robots.txt file.
It's also a good place to tell them about your sitemap file; the main search engines are guaranteed to see it if you put it in here.
(If you want more information about robots.txt files, or how to create one, there's a lot of (easy to understand) information at robotstxt.org)
To add your sitemap, just add this one line of code to your robots.txt file:
You can paste it anywhere inside the file, it doesn't matter. (Of course, replace the example address with the address of your sitemap. Make sure it's a full URL, including the HTTP and your domain name, just like in the example above.)
You can add more than one sitemap if you need to. For instance, you might have a sitemap automatically generated by your blog platform, another one from your forum platform, and another one for your regular website. Just duplicate that line of code, and change the URL to your other sitemap. Add as many as you like!
Don't try to use your sitemap to hide information on your website. You might be tempted to exclude certain pages from your sitemap so that the search engines don't find them - it doesn't really work that way; they might find them anyway.
If you don't want certain pages on your website to be indexed in the search engines, you need to use your robots.txt file.
In this lesson we've looked at Sitemaps, what they are, how they're useful and how to create them. We also looked at the contents of an XML sitemap, and different ways you can submit your sitemap to the search engines.