Blog


Musings from Mike Atkinson on Internet strategy, usability, and more...

Dealing with form spam

I read this post on a listserv yesterday, written by good buddy (and ubersmart) Paul Kulp. The discussion was on form spam (a blight on society!). Fortunately Paul’s company has done a lot of research into it and he generously shared his insights. He agreed to let me post it here, too. (This may seem more techie than I post, but it’s ultimately a usability issue…)

Paul writes:

In our experience, you should be slow to jump to a CAPTCHA solution. For those who don’t know, CAPTCHA refers to those images where you have to type in the crazy letters.

A properly implemented CAPTCHA is a very powerful tool for stopping spam, but it is not foolproof. For instance, it cannot stop spam sweatshops (where low wage workers are paid to type in letters from CAPTCHA images). It also has the significant drawback that it is annoying for users.

IMHO There are other approaches that should be tried first. These come from considerable experience and analysis of raw web logs to determine how spammers are operating.

One thing of note, once spam engines are sending to your form – they will keep trying to submit to it. They have stored the address and form fields and will continue hammering it forever. The only way to avoid those is to change your form submission process so that those submissions won’t work anymore.

This is a brief list of approaches that can be used to identify/avoid spam, in the rough order they should be attempted. Feel free to suggest anything that’s missing.

1. get your form off of search engines (unfortunately, this doesn’t help once they’ve found you).

2. use Javascript – there are a number of variations on this, but basically present a form to the browser that is somehow broken, then fix it with Javascript. Non-javascript users will be treated as spam, but in many cases that is acceptable. You can also add an additional step in that case to support non-javascript users.

3. monitor content – In many cases, you can look at the content that’s submitted to determine if it’s spam. For instance, you can often discard submissions that have html in inappropriate fields.

3. use sessions – turns out the spammers (especially sweatshops) are very interested in conserving bandwidth. They also don’t follow the normal path thru your site. Our examination of stats implies that the sweatshops are carefully firewalled to keep workers from viewing any images they don’t need. We have had success on some sites by noting in the session whether the user has been through the appropriate lead-up pages – or if they have loaded a particular template image.

4. CAPTCHA – (refer to discussion above) This is really the brute force approach and is generally the costliest to implement.

Paul Kulp, Ph.D.
Chief Information Officer
WSG
4 Interstate Ave Albany, NY 12205
518.435.0682
http://www.wsg.net