CAPTCHA, those little groups of distorted letters or numbers we are being asked to recognise and type into forms, are becoming more and more common on the web, and at the same time, frustratingly, more and more difficult to read, like visual word puzzles. The general idea behind them is that by filling them in, you are proving that you are a human being, given that the task would be difficult for a computer program to carry out. CAPTCHA aim to reduce the problem of spam, such as comment spam on blogs or forums, by limiting the rate at which one can send messages or sign up for new accounts enough so that it becomes undesirable for spammers.
There is a problem, however, when these are talked about as a ‘security measure’. They are not a security measure, and this misconception is based on a flawed understanding of security: that humans can be trusted whereas computers – bots – cannot. CAPTCHA are not a bouncer; they cannot deny entry to any human that looks like they are going to start trouble. They can only let in all humans. If your security strategy is something along the lines of ‘If it’s a human, we can trust them’, then you are going to have problems.
Another problem with CAPTCHAs is that they are relatively easy to exploit, and by this I mean, pass a large number of tests easily in order to spam more efficiently. While a single human can only look at a certain number of images per hour, a number of humans, with a lot of time on their hands, can look at thousands of them per hour. If the internet has taught us anything, it’s that there are a lot of humans on the internet with a lot of time on their hands. So, while a CAPTCHA will slow one human down, it won’t slow hundreds of humans down.
The so-called ‘free porn exploit‘, a form of relay attack, takes advantage of this. CAPTCHA images are scraped from a site and shown on the attacker’s site. On the attacker’s site is a form instructing its visitors to identify the letters in the image, often by promising access to something such as free porn. All the attacking computer needs to do is drive a whole bunch of people to that form and then use all the resulting solutions to carry out the intended dirty work at the original site.
It doesn’t have to be porn, of course – that is just a popular way of illustrating this form of circumvention. Any time a human wants something, or even is a little bit bored, you can ask them to fill out a form. Get free jokes in your inbox, fill out this form. If you get many humans working against a CAPTCHA it makes the CAPTCHA ineffective.
Technology for solving CAPTCHA automatically, without requiring any human intervention at all, is also evolving at a high rate. Scripts that can solve known CAPTCHA variants are becoming available all the time, and in response, new CAPTCHA variants are emerging too, each one more difficult to read than the last. The computer power required to recognise characters visually is trivial compared to, for example, the computer power required to crack a good password from its hash, or to break an encryption scheme based on encrypted data. The CAPTCHA that are unbreakable today are only unbreakable by obscurity; they are constructed differently enough for previous ones that current computer programs don’t recognise the letters and numbers in them.
What are alternatives to CAPTCHA?
The alternative will vary depending on what you are using it for. If you are using CAPTCHA for reducing spam on your blog, then they will probably continue to do so, though you may find yourself resorting to other options. Bayesian or rule-based filtering, or a combination of both, are effective methods of reducing spam, and have the added benefit that they do not annoy the user or impede usability or accessibility like CAPTCHA would.
If you are using CAPTCHA as a security measure, you would need to ensure that you are only doing so based on a proper understanding of what kind of security they bring. Certainly, they cannot do anything about keeping unauthorised or unwanted people out, as this is not what they are designed for. They also have severe limitations in their ability to reduce spamming or flooding, due to them being relatively easy for a sufficiently organised attacker to bypass.
An earlier version of this article was published at SitePoint.com in November, 2005.