top of page

AI versus CAPTCHA: The silent war training robots to think

  • May 11
  • 6 min read

The sci-fi image of the future handed down to us by Hollywood has invariably been one in which robots perform feats of physicality for humans. From Robocop to WALL-E, tomorrow is a story of hardware replacing biology — making humans safer and their lives more comfortable.


And yet, it transpires that reality is the inverse of this picture: with artificial intelligence (AI) fuelling a cognitive revolution that seeks to outsource our thinking. This turn of events is yielding strange economies, in which bots are employing humans to do the physical work they cannot. This dystopia is laid bare in the rise of CAPTCHA farms, which employ low-cost human labour to solve large volumes of CAPTCHA challenges on behalf of bots — enabling bad actors to bypass organisations’ security checks.


But at what cost do humans feed robots the data they need to understand text, images, and behaviour? What will organisations do if AI becomes so smart it can pass our most sophisticated Turing tests?  How then would we distinguish between humans and machines online?


Passing the Turing test


A Completely Automated Public Turing test to tell Computers and Humans Apart, or CAPTCHA, is an online challenge-response authentication, requiring users to interpret distorted text or identify specific images from a handful of options. The process serves to protect websites from bot attacks, spam, and abuse by verifying that a user is human.


Invented in the early 2000s by the Guatemalan-born computer scientist and founder of Duolingo, Luis von Ahn, CAPTCHA was initially a solution to one of Yahoo’s biggest operational challenges of the time: preventing bots creating millions of spam email accounts.


The task in front of von Ahn was to create an online test that could rapidly distinguish humans from computers. It needed to be passable by any human — irrespective of their age, gender, education or language — but be gradable by computers. This seemingly paradoxical feat was eventually solved by testing for optical character recognition (OCR), something humans have had millennia to practice. At the time, computers struggled to interpret text, especially if the picture was noisy, blurry, or dependent on context.


So promising was von Ahn’s proposal that Yahoo adopted CAPTCHA challenges for its sign-up page, which proved highly successful in reducing the number of spam accounts. In the background, however, the data being entered by humans was slowly making computers smarter.


By 2005 a new, more complicated version of the test emerged, known as reCAPTCHA, which used two words for verification. The solution was used so many times around the world that, according to Vox, a years’ worth of New York Times articles had to be digitised every four days.


In 2009, Google acquired reCAPTCHA and began using the capability to digitise its scanned books and news archives. In the process, a library of distorted characters was built, enabling computers to extrapolate letters and words from new images. In short, reCAPTCHA taught computers how to read.


In order to gauge the effectiveness of existing reCAPTCHA solutions, Google conducted a test and published the findings in 2014. It announced that humans could interpret the most distorted CAPTCHA challenges with a 33% accuracy, while AI could do so with a 99.8% accuracy. Faced with this disparity, CAPTCHA had to be updated once more.   


On the back of these results, Google released reCAPTCHA V2 in 2014, which used images instead of text. At that time, computers were not as good as humans at recognising complex images. However, this soon changed when Google leveraged its reCAPTCHA V2 data to train computers to identify real-world objects. For example, users’ identification of transport-related objects — such as cars, bicycles ad traffic lights — started being used to help Google’s self-driving cars operate safely. It was also leveraged to improve the Google Maps service.


Soon enough, Google found that computers became better than humans at identifying real-world objects, just as they had found in 2014 with text, so a third version of the test had to be designed: NOCAPTCHA. This most up-to-date iteration verifies humans based on their online behaviour, such and click rate or mouse movement. This test is invisible and runs in the background, sacrificing privacy for a frictionless user experience.


For now, NOCAPTCHA is a highly effective method of distinguishing humans from bots. Google has not declared publicly how it is using the data.


Humans in the loop


Though CAPTCHA has evolved massively since the early 2000s, many organisations still use legacy versions for cost and simplicity. This is a weakness exploited by CAPTCHA farms, which employ humans to help bots solve CAPTCHA challenges in return for cash. Sometimes referred to as human-in-the-loop services, these illicit operations employ humans to complete CAPTCHA challenges in real time, so that bots can bypass organisations’ security measures.


Here’s how CAPTCHA farms work:


When a malicious bot encounters a CAPTCHA, it sends the challenge to a farm via an application programming interface (API). Human workers at the farm then solve the puzzle and send the result back to the bot near-instantly. The farm is compensated for each challenge it solves. Such services can be used for mass account creation, in which hundreds of fake email or social media accounts can be made for the purposes of spamming. They can also be used for credential stuffing, to bypass login protections and perform account takeovers; and data scalping, to extract information from sites that limit non-human browsing.


The success of CAPTCHA farms has only been improved by the introduction of AI. By leveraging computer vision, machine learning (ML), and multi-modal AI models, the solving of visual and audio challenges is increasingly automated — dramatically improving the speed at which CAPTCHAs can be solved by farm workers.


Banks beware


The UK government has formally advised organisations that, even with a CAPTCHA in place, services could be at risk as a direct result of these farms. The government notes that many of the risks that CAPTCHAs are aimed at reducing can be addressed in other ways, including through rate and connection limiting, honey pots, and transaction monitoring.


Despite the risks to organisations, CAPTCHA remains one of the most widely deployed security technologies on the web, with millions of companies still using systems like Google’s reCAPTCHA — including banks. According to a database that tracks almost 2.5 million companies on reCAPTCHA — from solo developers to Fortune 500 brands — the top companies using these systems span IT consulting, logistics, retail, and banking. Indeed, the likes of Accenture, Deloitte, Bank of America, Oracle and HSBC all leverage some form of CAPTCHA challenge. The most notable adopters of Arkose Labs — a common CAPTCHA provider with a 3% market share — are banks.


Typically, banks use CAPTCHA on their websites to prevent bots from logging into user accounts, scraping data, or performing credential stuffing attacks. Some, however, use CAPTCHAs for critical actions, such as authorising high-value transfers. This is because many older, free, or low-cost CAPTCHA options are still effective against entry-level bot attacks. Advanced AI, however, is a growing concern, with many cutting-edge models able to easily solve traditional text-based and image-based tests.


In failing to keep pace with the technologies used by cybercriminals, FIs risk not only exposing their assets — they risk sacrificing user experience. Indeed, the use of traditional, manual CAPTCHAs is a source of frustration for many customers and can increase churn rates. For these reasons, many FIs are switching to more user-friendly, secure alternatives, like NOCAPTCHA or Cloudflare Turnstile.


The death of CAPTCHA


In releasing NOCAPTCHA, the industry has staved off the mass proliferation of malicious bots, at least for now. Of course, there will always be CAPTCHA farms that can be hired to break these security checks, but organisations can at least rest assured knowing computers cannot replicate human behaviour.


The great irony of this story is that, in using CAPTCHA challenges to distinguish humans from robots, the world taught robots how to become indistinguishable from humans. Even our most advanced internet Turing tests are now at risk of being defeated by robots. This is the view of CAPTCHA’s creator himself, Luis von Ahn, who has said he is sure that one day AI will pass our most sophisticated NOCAPTCHA challenges.


So, how long do we have? What happens when we can no longer distinguish humans from robots? And, most importantly, what is Google doing with our NOCAPTCHA data?

Comments


bottom of page