A can of SpamAutomated comment spam is a huge problem for anyone using any of the popular blogging programs. For anyone sick of comments about cheap Viagra or fake Rolexes, here are my suggestions.

1. Make it hard to identify your website as a weblog.

The web is a target-rich environment; a Google search for mt-comments.cgi gives 3.7 million potential targets from one popular piece of software alone. As with any automated attack, being one of many vulnerable sites is not a protection.

Ideally, you want to make it hard for spammers to find your blog without making it hard for everyone else.

Most of the default templates that come with blogging software has a “Powered by {SoftwareName}” link on every page. This is the first thing I would remove from any template, along with any direct links to the file that handles comments. This is not something I want showing up in a Google search.

Turning off the option to notify other websites when you update (weblog ping) also reduces spam but will also reduce the number of new people coming to your site.

2. Make spam hard to automate.

Making people log in before leaving comments is the most obvious solution to this problem, but it will obviously reduce the number of legitimate comments that are left. I don’t like the idea of discouraging casual comments.

I also think that IP blacklists are a bad idea. They prevent regular users who share a proxy with a spammer from leaving comments. They also have the same problem as any blacklist – the centrally maintained database is subject to poisoning.

Captchas aren’t foolproof, but I think they are the best solution at this time. No blogging tools have included a captcha implementation in the main source tree, but they all seem to have this option as a plug-in or mod. The main objection to captchas is that they disadvantage blind people – this is coming from people who have no problem denying all requests from a class C IP address range. Surely they couldn’t object to captchas if users were given the alternative of logging in if they couldn’t handle it.

A very temporary fix that seems to be effective against the majority of the current generation of spam bots is renaming the file that is used to post comments. While it is simple to write a bot that will handle this, until more people do this the spammers may not bother. It’s like the joke about the two men who stumble upon the hungry mountain lion. “I don’t have to outrun the lion, I just have to outrun you.

3. Filter anything that does get through.

With the number of spam comments a well connected blog receives, sending all new comments to a queue for approval before being displayed is no longer a practical solution.

The current version of WordPress (version 1.2.2) has an option to automatically moderate comments that contain certain keywords or have more than a certain number of links. I would expect to see more detailed heuristic filtering like this in the future, with the obvious next step being baysian filtering.

4. Eliminate reward.

The main reason blogs – even those with very low traffic – receive comment spam is Google PageRank. The more pages that link to a website, the higher it’s PageRank, and the higher it appears in Google’s results for related search terms. Comment spam is a way of artificially inflating a site’s PageRank.

Some blogging software has started to redirect links in comments through an intermediate page, so they will receive no PageRank benefit. Unfortunately, spambots are too dumb to realise that you are doing this, and post spam comments anyway. Until everyone does this, it will not discourage spammers.

 

Published On: January 7, 2005Tags:

3 Comments

  1. Anonymous January 27, 2005 at 3:19 pm

    Check out the Google Blog entry on their solution to prevent comment spam.

    If you’re a blogger (or a blog reader), you’re painfully familiar with people who try to raise their own websites’ search engine rankings by submitting linked blog comments like “Visit my discount pharmaceuticals site.” This is called comment spam, we don’t like it either, and we’ve been testing a new tag that blocks it. From now on, when Google sees the attribute (rel=”nofollow”) on hyperlinks, those links won’t get any credit when we rank websites in our search results. This isn’t a negative vote for the site where the comment was posted; it’s just a way to make sure that spammers get no benefit from abusing public areas like blog comments, trackbacks, and referrer lists.

    We hope the web software community will quickly adopt this attribute and we’re pleased that a number of blog software makers have already signed on.

  2. Anonymous January 28, 2005 at 2:00 pm

    There is a Kuro5hin discussion on this at http://www.kuro5hin.org/story/2005/1/19/35627/2443
    it says that Google’s solution will make things worse.

    A comment points out that “It’s irrelevant to Google whether this stops comment spam. What it will stop is comment spam interfering with Google’s page rankings, so it will improve Google’s page results.”

    The nofollow attribute means that we will be able to criticise and link without giving pagerank. http://radio.weblogs.com/0001011/2005/01/18.html#a9229

  3. Stuart Moncrieff June 12, 2005 at 11:36 pm

    This is the only page on my website that attracts spam, as it contains several keywords that blog spammers search for when looking for fresh targets.

    Consequently, I have had to disable comments for this page.

    If you have something valuable to add, or would like to give me some feedback, please send me an email.

Comments are closed.