Why CAPTCHA are not a security measure

CAPTCHA, those little groups of distorted letters or numbers we are being asked to recognise and type into forms, are becoming more and more common on the web, and at the same time, frustratingly, more and more difficult to read, like visual word puzzles.  The general idea behind them is that by filling them in, you are proving that you are a human being, given that the task would be difficult for a computer program to carry out.  CAPTCHA aim to reduce the problem of spam, such as comment spam on blogs or forums, by limiting the rate at which one can send messages or sign up for new accounts enough so that it becomes undesirable for spammers.


There is a problem, however, when these are talked about as a ‘security measure’.  They are not a security measure, and this misconception is based on a flawed understanding of security: that humans can be trusted whereas computers – bots – cannot.  CAPTCHA are not a bouncer; they cannot deny entry to any human that looks like they are going to start trouble.  They can only let in all humans.  If your security strategy is something along the lines of ‘If it’s a human, we can trust them’, then you are going to have problems.

Another problem with CAPTCHAs is that they are relatively easy to exploit, and by this I mean, pass a large number of tests easily in order to spam more efficiently. While a single human can only look at a certain number of images per hour, a number of humans, with a lot of time on their hands, can look at thousands of them per hour.  If the internet has taught us anything, it’s that there are a lot of humans on the internet with a lot of time on their hands. So, while a CAPTCHA will slow one human down, it won’t slow hundreds of humans down.

The so-called ‘free porn exploit‘, a form of relay attack, takes advantage of this. CAPTCHA images are scraped from a site and shown on the attacker’s site. On the attacker’s site is a form instructing its visitors to identify the letters in the image, often by promising access to something such as free porn. All the attacking computer needs to do is drive a whole bunch of people to that form and then use all the resulting solutions to carry out the intended dirty work at the original site.

It doesn’t have to be porn, of course – that is just a popular way of illustrating this form of circumvention.  Any time a human wants something, or even is a little bit bored, you can ask them to fill out a form. Get free jokes in your inbox, fill out this form. If you get many humans working against a CAPTCHA it makes the CAPTCHA ineffective.

Technology for solving CAPTCHA automatically, without requiring any human intervention at all, is also evolving at a high rate.  Scripts that can solve known CAPTCHA variants are becoming available all the time, and in response, new CAPTCHA variants are emerging too, each one more difficult to read than the last.  The computer power required to recognise characters visually is trivial compared to, for example, the computer power required to crack a good password from its hash, or to break an encryption scheme based on encrypted data.  The CAPTCHA that are unbreakable today are only unbreakable by obscurity; they are constructed differently enough for previous ones that current computer programs don’t recognise the letters and numbers in them.

What are alternatives to CAPTCHA?

The alternative will vary depending on what you are using it for.  If you are using CAPTCHA for reducing spam on your blog, then they will probably continue to do so, though you may find yourself resorting to other options.  Bayesian or rule-based filtering, or a combination of both, are effective methods of reducing spam, and have the added benefit that they do not annoy the user or impede usability or accessibility like CAPTCHA would.

If you are using CAPTCHA as a security measure, you would need to ensure that you are only doing so based on a proper understanding of what kind of security they bring.  Certainly, they cannot do anything about keeping unauthorised or unwanted people out, as this is not what they are designed for.  They also have severe limitations in their ability to reduce spamming or flooding, due to them being relatively easy for a sufficiently organised attacker to bypass.

An earlier version of this article was published at SitePoint.com in November, 2005.

Security warnings that fail to be helpful: Who are you trying to confuse?

Warning: This file may contain malicious code, by executing it your system may be compromised.

If your average user is a computer security professional, this warning, found in a web application for corporate intranets, may be appropriate.  But does the average person want to understand the concepts of ‘malicious code’ or a ‘compromised’ system?

Wouldn’t they rather just be gently reminded that you can’t trust every file that you download on the web?

Let’s consider for a second that a user sees the original warning and doesn’t ignore it.  They may ask, ‘what is malicious code?’  Well, it is code, erm …  Well, computer programs are written in what’s called code … and sometimes that code might be erm … written to do bad things to your computer.  If they are not confused enough already they might try and ask what a compromised system is.  And no, it doesn’t really have much to do with a ‘compromise’ on anyone’s part.

If you really want to scare people, you could at least use terms that are more likely to be widely understood, like ‘files may contain viruses!’  ‘Be careful!’

Writing language that people are likely to understand is not dumbing down.  It would be dumbing down if you removed some of the most relevant and important details from your message, leaving your users feeling cheated because details were withheld.  But in this case, the fact that software is made up of code, and some code may be a bit nefarious, or the concept of a compromised system, detracts from the most important detail of the message, which is that it might not be safe to trust the file you are downloading.  There are also some more clear, more concise alternatives, such as the word ‘virus’.  This word has come to collectively signify everything nasty that you could let loose on your computer without meaning to.

Telling people not to get phished

If you provide users with a password, you should probably think about telling them how to keep their password safe.  Teaching users how to avoid being tricked into giving away their account data – being ‘phished’ – can be difficult.

Social engineering is a method of obtaining access to a secured system by exploiting a person’s trust.  It consists of deceiving a person into granting access to a system by some sort of pretense.  For example, let’s say I receive a rather desperate sounding phone call from an intern over in IT who has screwed up and lost their password and needs desperately to fix some problem for their boss.  They know I have an account and are hoping I would be so kind to log in for them, something which I might be happy to do for a colleague.  However, the person on the phone is not an intern in IT at all, and doesn’t even work for the company.  What’s more, the second I have given them access via my own username and password, all of the security precautions are now absolutely useless; the attacker has gained access to the system they wanted to access.  What if they pretended to be working at the bank?  They might goad me into letting them empty my bank account.

A second example of a social engineering attack is to exploit a person’s guilt – to make the person believe that they have been caught doing something wrong and may get in a lot of trouble if they do not cooperate.  This ‘cooperation’ may involve handing over their personal details.  This kind of attack can even work if the victim did not do anything wrong; the act of being ‘accused’ can put someone into a defensive state.  The desire to cover up any wrongdoing they have been accused of may distract them from the fact they are being conned.

The term phishing is used to describe such attacks when they are done over a message service, such as over email or text messages.  Phishing is also often done on a large scale; a would-be attacker sends an email to perhaps thousands of people pretending to be from the IT department, or a bank, or something, hoping that at least one person will fall for the scheme.  Some such schemes are wildly inventive, while there are just as many that are stock standard: ‘we need to confirm your account details’, or ‘we need to verify that your account is active’.

From the point of view of anybody involved in computer security, the fact that such attacks are so effective is depressing.  They are effective for many reasons.

One reason such attacks are effective is that, like with any security precaution, it is as weak as its weakest point.  In a large organisation in which lots of people have access to a system, only one person needs to slip up and accidentally give their username and password to the wrong person in order for the system to be compromised.

Another reason is that the users of a system may, being less confident with technology, be naturally inclined to trust and be a little fearful of somebody who both seems to know a lot more about technology and is in a position of authority; for example, someone from an IT department, or law enforcement, or who has access to their bank account.  The consideration as to whether or not the person who contacted them is legitimate takes second place to the desire to comply with this person who seems so much more knowledgeable about the system.

It may also be that people don’t realise that computer security does not stop at some unseen attacker trying to guess or steal your password; that in large part an attacker can just walk right up to you and ask for it.

So, what do we tell the users?

Systems administrators often use the phrase ‘we will never ask for your password’.  This is a good message, because it at least signals to users that there may be nefarious motivations behind someone offical asking you to confirm your password.

However, in most cases where someone is duped into giving their account information, they actually believe the person who has contacted them is legitimate.  The phrase ‘we will never ask for your password’ can quickly develop exceptions to the rule; an attacker might say ‘Oh, but our systems are down and we have to log people in manually today.’  As it is coming from a person genuinely believed to be legitimate, such an exception is easily accepted to be true both because it is plausible, and because the victim trusts the attacker to know more about the issue than they do.

I think that users should be instructed that if they are ever asked for their password, even by genuine system administrators, they should not give it over the phone or in reply to the email.  Instead, the receiver should call back the company on the known correct phone number and then give the password.  Let’s say that I call you up and tell you that we are in the process of deleting unused accounts and we need your password to confirm whether your account is used or not.  If you truly believe that my story is legitimate, you may ignore the advice that we never ask you for your password, because my story seems like a plausible reason for an exception to the rule.  But if you have been told that you should always call me back when asked for a password, you may be less likely to be convinced by my insistence that that isn’t necessary.  I might say that you won’t be able to call me back if you tried, or that the matter is urgent, but this may raise more red flags.

In terms of email phishing, too, we can instruct users never to click through a link to a site on which they have an account; instead, should they wish to visit the site they should type the site address or name into their browser.

Whether this is all effective is speculation, and it must still be remembered that no matter how security conscious an organisation is as a whole, it only takes one weak link: one uninformed or absent-minded person to slip-up and allow a breach of security.

RSS feeds not fit for human consumption

It’s not feeds that I have a problem with, just using the term ‘RSS feeds’ or ‘RSS’ to describe them.

The term ‘RSS’ is hairy to begin with.  It isn’t sure if it should stand for ‘Really Simple Syndication’, ‘Rich Site Summary’ or ‘RDF Site Summary’.  That third one, with the nested acronym, is particularly hideous.  Passable for people who work with RDF perhaps, but that isn’t many people these days.  Besides which, ‘syndication’, and ‘site summary’ just don’t seem to convey the right idea to me.  They don’t reach out and tell me about grabbing headlines and bits of articles from a site and viewing them in other ways.  ‘Site Summary’ is a fairly vague term which could just as easily refer to a website’s ‘About Us’ page, and ‘Syndication’ is not really what feeds are used for these days.

Then there’s the issue that not all feeds are RSS, and not even all RSS is RSS.  RSS is a name used by two separate, competing and incompatible formats (or more if you count previous versions which are not forwards-compatible).  RSS is therefore not only useless in referring to the concept of a feed, but it’s useless in referring to a particular format of feed.  Yet another format is called Atom – not RSS at all.  The term ‘RSS’ unfairly excludes other implementations of the same concept.

Feeds are being increasingly used by web users due in part to better integration of feed readers or subscription mechanisms for feeds into browsers.  But along with this we need to use an appropriate name for them.

I am a fan of the term ‘web feeds’.  Firefox 2.0 used the term ‘feed’ as in ‘subscribe to this feed’.  The upcoming Firefox 3.0 gets more specific by calling them ‘web feeds’.  Internet Explorer 7.0 simply calls them ‘feeds’.  Opera 9.x muddies things by alternating between the terms ‘feeds’, ‘subscriptions’ and even ‘newsfeeds’.  And last but not least, Safari 3.1 refers to them as ‘RSS’.  Not even ‘RSS feeds’ – just ‘RSS’.

Google Reader simply calls them ‘subscriptions’ as far as I can tell, which is a decent term.  In other locations Google also uses the term ‘feeds’.  Wikipedia’s main page about feeds is now called ‘Web feed’.

With the exception of Safari, then, the major browsers and the other companies I mentioned have all opted to avoid the technically vague and misleading ‘RSS’ term and go with a more general term for the concept, with ‘feed’ by far the most popular, followed by ‘subscription’ and trailed by variants upon the word ‘feed’ such as ‘web feed’ or ‘newsfeed’.

So, is ‘feed’ a suitable term?  The word itself doesn’t describe the function; feed could easily be something I give to an animal.  The usage of the word seems to come from the context of radio or television broadcasting, where a ‘feed’ is some content that has been sourced or ‘streamed’ from another network.  It’s not an obvious link, to me at least, but once realised the analogy holds up.  I can subscribe to a feed of content sourced from another website.

What is certain is that the term ‘RSS’ really has to go.  It isn’t specific enough to be used as a technical term because it could refer to one of multiple competing formats.  At the same time, it isn’t inclusive enough as a general term as there are feeds that are not actually using any RSS-named technology.  With the exception of Safari, the term ‘RSS’ is not exposed to end users in any of the major web browsers, which instead opt for the more general ‘feed’ or ‘subscription’.  Most of all, it’s a confusing, alienating three letter acronym that doesn’t become more self-explanatory after expanding it into any of its many alternative backronyms.

3D web browser shows the future of the web

Tired of viewing boring flat web pages on a boring flat screen? Well, the future looks pretty sweet. Now you’ll be able to zoom around your web pages in pseudo 3d space as if you are superman flying around in a world of flat plastic billboards, stopping to peruse them from a funny angle.

This futuristic vision comes from an article about the SpaceTime browser, which even has screenshots of what the 3D plastic web pages look like.

It isn’t the first of April and this is a real, downloadable product, so maybe I’ll have to actually address some of the things wrong with this glowing newsvertisement and prediction of the future of the web.

It looks like there’s a lot of screen space dedicated to displaying empty space, a faded grey-black gradient in front of which the web pages you’re viewing ‘float’. Web designers are forever trying to fit more onto their pages, for better or for worse, and as monitor resolutions increase, web pages will try to fill them as much as possible. Browser designers think (and argue) long and hard about how they can improve screen space by removing unnecessary elements. That’s why the web page in Firefox goes all the way to the edge of the screen, the tab bar and scroll bar is hidden when not needed, and Internet Exploder 7 does away with the menu bar.

When I am browsing with multiple tabs, the tabs are lined up from left to right in a linear fashion along a bar. I don’t have to remember the vertical and horizontal position, depth and angle of each tab. In 3D if I move my viewing position, particularly angle, in the 3D space then I’ll be completely lost. This happens whenever I click on a different tab. More wasted space, too: each tab has its own fat title bar and close button.

Each floating ‘tab’, which is the wrong word to use in this 3D environment, is fairly small. A ‘zoom’ function enables you to make it bigger, but this just zooms into the same small window onto a web page; it doesn’t show more of a web page.

Am I expected to read any text on one of those ‘other’ tabs (or search results) which is floating away in the ‘distance’? It looks like they’re there just to remind you that you have other stuff open, which can be done a lot better if it is done sparingly and isn’t constantly distracting me from what I am doing on my chosen tab and taking away valuable screen space.

Thankfully, I can ‘maximise’ a tab, and it will fill the screen like a normal web browser. For a little while. As soon as I go and open something in a new tab, or switch tabs, or go ‘back’, everything goes back into 3D view and my pages get really small and text becomes fuzzy and most of the screen is filled with blank space.

What is the point of using a 3D interface to display things which are inherently two-dimensional? It adds complexity to the simplicity and lets you get lost in a whole new dimension.

Why should a web browser include a ‘gravity’ toggle and buttons like ‘straighten up’, ‘fly’ and ‘reset scene’. Isn’t that just kind of insanely ridiculous? And what is the point of being able to ‘walk’ or ‘fly’ behind a web page? The back and sides of a web page are grey, by the way.  Maybe HTML will evolve so that someday we can put stuff on the back of a web page.  Not.

History is always repeating and people are always going to claim something is new when it isn’t. For decades people have envisioned future user interfaces as being inherently three-dimensional, but shown on two-dimensional media such as computer screens or big glass displays. It’s reflected in futuristic movies and TV shows too. It’s become such a trite view of the future that I have seen it used to comic effect, and to pretend that this prediction of future web browsing interfaces is in any way new and innovative is laughable.

It isn’t a 3D web browser. A 3D web browser would browse a 3D web, but the web is only 2D. It’s browsing a 2D web and hanging the two-dimensional web pages in a 3D space. At best that’s 2.5D. VRML was a 3D web and an alternative to the text and 2D graphics based HTML, but it is all but unused now. It didn’t turn out to be a handy way of representing things on the web. Now some people are claiming that Second Life is the future of the web, but it isn’t really going to take over from the web and isn’t intended to. It is an interesting virtual world to walk around in and interact with people, but what if you just want to look something up? I use the web so I don’t have to walk around looking for stuff in buildings or worry about the clothes I’m wearing.