If you've been plying the waters of the web design community for a while, you might have heard people talking about your "text to code" ratio, and how it can affect your SEO rankings.
This is the idea that the more text and less "code" you have on your page, the easier it will be for the search engines to understand, and the better they will index it as a result. You'll see a lot of discussion on SEO forums and blogs about this, and you'll even find text to code ratio calculators online as well. You can just plug in your URL and it'll tell you your ratio.
There's a nice quote from Adam Lasnik, a SEO strategist at Google, where he says that, as of April 2007, Google didn't have the ability to use this as a factor in their search rankings. And when you think about it, his reason makes a lot of sense. He says:
"There are a ton of very high quality sites [...] from universities, from research institutions, from very well respected ecommerce stores [...] that have really crufty sites, and sites that won't validate. [...] And, because this is quality content, we really can't use that as an effective signal in search quality. So, you can quote me a saying, I would be thrilled, it would make my day if people would decruft their sites, but it's not going to directly affect their Google ranking."
That said, there seems to be a lot of anecdotal evidence out there that cleaning up your code can help your search engine rankings.
I think that this is a situation people see the results and they automatically assume that there's a connection between amount of code on the page and their search engine rankings. Personally I don't think that reducing the amount of code makes a difference. It can, however, cause a number of other things to happen that might help boost your search engine rankings:
1) In cleaning up your code you're likely to switch from:
<font size="36" color="red"><b><i>Here's my great title with my keywords in it!</i></b></font><br><br>
<h1>Here's my great title with my keywords in it!</h1>
Yes you're reducing the amount of code, but you're also beginning to use more semantic code that actually tells the search engine spiders what this part of the page is. They now know that it's your header, and your most important header to boot. They see your keywords in your most important header and they thing "this page is relevant to these keywords".
You're likely to do a bunch of these things that makes your page easier to understand for the search engine spiders. They know what the important parts of your page are, and because you've cleverly put your keywords in those important parts, they'll probably help you up the rankings.
2) If you've got a menu bar on the left of your page, chances are that the search engines are seeing your navigation as the first real content on your page. If you take the view (as we do) that the first 50 words or so on your page are quite important, then having your navigation in those first 50 words doesn't seem so smart. When cleaning up your code you may move to a DIV oriented layout where you can make sure your content is first up on the page, not your navigation. This might help also.
3) If you've got really, really bad HTML with lots of open tags and confusing scripts, then it's possible you'll be causing the spiders to blow smoke out their ears when they try to read your page. If you clean it up you're likely to fix those errors, thus helping your page to be indexed correctly.
4) Lastly, and perhaps most importantly, by “de-crufting” your site you reduce your page loading time. Your pages are smaller, faster, quicker to load. This means that when somebody clicks through to your page from the search engines, there's a much lower chance that they will get bored with waiting for the site to load, hitting the “back” button in frustration. This doesn't just mean that you get more actual visitors... it also reduces your “bounce rate” (which is the number of people who visit your site and “bounce” out of there without sticking around to take a look. Some search engines pay attention to this, because a high bounce rate means your site isn't interesting for users.)
So what's the verdict? In my humble opinion, text-to-code ratio is not important for SEO for the reason that most people think. If your page loads reasonably quickly (without a ridiculous wait as it processes a hundred server requests) then you' re probably fine with a reasonable amount of cruftiness. But there are lots of things to be gained from de-crufting (for example, coolness), and one of the side-effects may just be better search engine rankings.