Spam fighting today

Posted on February 27th, 2008 in General by robh

There are many books and articles on filtering unsolicited commercial email but most of these seem to be at least a couple of years old. Some of the best are closer to ten years old. The cat and mouse game between spammers and email administrators means that things are constantly evolving and I’ve long thought it would be good to see some hard figures for how effective different methods are. I will therefore try to present my current experience. This won’t be particularly scientific and applies more to smallish systems than email service providers. Hopefully it will help someone, though.

My first spam filtering used a customised procmail filter, adapted from one I found on Oxford University’s IT support site (I can’t find it anymore, unfortunately). This searched for likely words in messages and even used scoring to assess how likely a message was to be spam. This actually worked reasonably well whilst I was at university but when I started working at Claranet, and started receiving a lot of mail for postmaster, hostmaster etc I found that the amount of wanted mail was tiny compared the amount of spam I had to wade through. I’d heard good things about Spamassassin so decided to give it a try. I was impressed, and with minimal customisation this setup worked adequately for several years. About 18 months ago, however I noticed two things. Firstly I was getting a lot of spam and deleting it all was getting to be a nuisance. Secondly and perhaps more importantly, the load on my server was continually high because of all the Spamassassin processes. Clearly I needed to start stopping some of this stuff from getting through before it even got to spamassassin. This, then is my current setup.

DNSBLs

I reject around 500 messages per day based on DNS block lists. Nowdays I only use one, the Spamhaus Zen list. This combines some of the lists I used to use with Spamhaus’s policy based list (which includes things like dynamic dial up IPs). In the past I’ve also used Spamcop but found that it gave a lot of false positives. I also found SORBS effective but don’t like their de-listing policy so just use them for warnings.

Of course, many of the 500 rejected messages would probably also be picked up by other means, but I like the fact that minimal resources are used at my end and that if there is a false positive the sender gets a quick and obvious message indicating what the problem is.

Message syntax checks

The next set of messages are also aimed at stopping mail at smtp level. There’s a good introduction to doing this in Exim here. Here’s a summary of what I’m using

  • Check that sender address is valid (but without callback)
  • Check that the hostname used in the EHLO greeting is valid (and not something like friend)
  • Check that the hostname used in the EHLO greeting isn’t the server’s name

I also tried a few other things that didn’t work so well:

  • Delays in the SMTP process. I found that having a delay long enough to stop the spam also stopped some legitimate mail. This does seem to work fairly well for sites with lots of users to slow down dictionary attacks, however.
  • Checking the sender address using the callback – This did stop some spam but some legitimate servers are (incorrectly) configured to reject mail with a null sender, so I stopped doing this.

Spamassassin

The remaining mail goes through spamasssassin’s scoring system. I tend to find this detects around 10 messages/day which go to a Spam folder which I manually review.