Back toward the start of February, I wrote about the implementation and ramifications of the rel=”nofollow” attribute on user submitted content. Shortly afterward, I was prompted to write the rel=”nofollow” follow up entry as well. The idea back then was that website spam, in particular weblog spam, was rife and we (read: search engines) needed an answer. As a *cough*solution*cough*, some bright people thought that if they removed the reward (of PageRank in Google for instance), that the spammers would stop.
I’m here today to tell you that it hasn’t stopped, in fact I’d nearly say that it has increased. On the grander scale of things, I don’t get a lot of spam. A useful by product of that is, it’s also very handy for me to gauge how much spam I’m getting. If I were receiving hundreds or thousands of spam messages daily, I’d be handling them through ‘mass editing’ methods – “select all, delete”. Since I’m not though, I get to see, glance and sometimes even read the spam!
Over the past months, I’ve tried various spam defense mechanisms on the site. Some people have gone to extremes to implement some of the things I mentioned (ie: a ‘set’ of mechanisms that change each time you attempt to post) – so as to require the human factor to make the post. These systems no doubt work very well, however the problem I see with some of them is that they are restricting users from commenting on your site. A friend of mine has a small blog, hosted at Blogger – problem is that her site requires that you are a member before you can post. This single thing alone has stopped me to date from leaving comments – I want to but I just can’t be bothered signing up to leave a comment. I’d consider myself someone that will go to fairly long lengths to get between A and B, but if I won’t sign up for a dummy account on blogger to post – this to me proves that some anti-spam techniques aren’t just stopping the spammers.
One of the anti-spam techniques that I find works very well is keyword detection. If a spammer mentions a certain phrase in their spam, it is flagged that it requires moderation before it will go live. I think one of the primary reasons that this method is so easy to implement and effective, is that the spammers utilise SEO techniques in their spamming. By implementing SEO techniques, I mean that they, for instance use the same word or phrase to help build their keyword importance and visibility. A practical example might be having 100 inbound links to a particular page on your site, but all with different link text versus the same 100 links but all with the same link text. Since we know that they use these SEO techniques, it plays into our favour for simple detection. I’ve got a list of less than 50 words that I use to capture spam and so far they are working out wonderfully.
This isn’t earth shattering news for most but it might remind some people that simplicity is a beautiful thing. I often think we get caught up in overly complex systems to get between A and Z when there is often a shorter simpler path available that we’ve overlooked or discounted previously.