For thousands of years, people have dreamed about AI and the challenges it might bring in distinguishing humans from machines.
We live in an era of wonders, most of which we take for granted.
The lives we lead and the experiences we have today were the stuff of legends and fairy tales not so long ago. To get a sense of this we need only look back at the dreams and fantasies of our forebears to see how many are real and even prosaic to us today.
Even the earliest known fables on this topic tend to address the need to distinguish human from machine, and managed to capture the key ideas hundreds or thousands of years before this became a pressing matter online.
Hiring a Bot Herder in Ancient China
King Mu of Chou made a tour of inspection in the west [c. 957 BCE]. On his return journey, before arriving in China, a certain artificer was presented to him, by name Yen Shih. King Mu received him in audience, and asked what he could do. 'I will do anything,' replied Yen Shih, 'that your Majesty may please to command. But there is a piece of work, already finished, that I should like to submit first to your Majesty's inspection.'
'Who is that man accompanying you?' asked the King. 'That, Sire, is my own handiwork. He can sing and he can act.' It walked with rapid strides, moving its head up and down, so that any one would have taken it for a live human being. The artificer touched its chin, and it began singing, perfectly in tune. He touched its hand, and it started posturing, keeping perfect time. It went through any number of movements that fancy might happen to dictate. The King could hardly persuade himself that it was not real.
Yen Shih pulled the automaton to pieces to let the King see what it really was. And lo! it turned out to be merely a conglomeration of leather, wood, glue and paint, variously colored white, black, red and blue. Examining it closely, the King found all the internal organs complete -- liver, gall, heart, lungs, spleen, kidneys, stomach and intestines -- and, over these, again, muscles and bones and limbs with their joints, skin and teeth and hair, all of them artificial.
Drawing a deep breath, he exclaimed: 'Can it be that human skill is really on a par with that of the Creator?' And forthwith he gave an order for two extra chariots, in which he took home with him the artificer and his handiwork.
-- Liezi text, Book V (c. 400 BCE)
The Liezi text is an early Chinese collection of Taoist philosophy expressed in stories. Here we see a fable describing an automaton that is not outwardly distinguishable from a person. In this case a destructive test rather than a cognitive challenge is used.
However, cognitive challenges are commonplace in these stories by the time we reach the 19th century.
Cognitive Challenges and Motion Analysis for Humanity Verification in 19th Century Germany
It was universally considered a quite unpardonable trick to smuggle a wooden doll into respectable tea-parties in place of a living person. Not a single soul - a few cunning students excepted - had detected it, although all now wished to play the wiseacre, and referred to various facts which had appeared to them suspicious.
The story of the automaton had struck deep and many lovers, to be quite convinced that they were not enamored of wooden dolls, would request their mistresses to sing and dance a little out of time, to embroider and knit, and play with their lapdogs, while listening to reading, etc., and, above all, not merely to listen, but also sometimes to talk, in such a manner as presupposed actual thought and feeling.
-- "The Sandman" by E. T. A. Hoffmann (1817)
The Sandman is a story in which a student falls in love with an automaton. This deception becomes widely known, and to avoid the same fate we are told in the epilogue that people now ask those they want to confirm are human to perform tasks that would be simple for them, but difficult for machines.
In particular, we see here a form of motion analysis described (subjects asked "to sing and dance a little out of time") in which expected motions are compared to simulacra along several dimensions. In modern bot detection language, we could say that the verifiers were much more interested in the behavior (how did the automaton or person dance?) rather than the challenge answer (could the automaton or person dance?).
Entering the Modern Era: Alan Turing and the "Imitation Game"
As we've seen, the core concepts here are nearly as old as recorded history. However, in 1950 computing pioneer Alan Turing codified them into a game-theoretic framework he called the Imitation Game.
This was framed as a test of a machine's ability to demonstrate intelligent behavior indistinguishable from that of a human.
Turing's original proposal took the form of a chat interface, where the player had conversations with two parties: one person and one machine.
The game proceeds as follows: the human player interacts with both entities. If the player cannot reliably tell the machine from the person based on their answers, the machine wins the game. This does not depend on the machine's ability to give correct answers to the questions asked, only whether its answers closely resemble those a person would give.
Practical Implementations of Humanity Verification: 1984-1995
While many chatbots were produced specifically to attempt to pass the "Turing test" described above, by the 1980s there were more practical and immediate applications of this idea at hand.
In particular, early dial-in bulletin board ("BBS") systems enjoyed a period of popularity from 1984-1995. They worked primarily on text-based interfaces, though by the early 1990s more graphical interfaces, including images, were sometimes used.
BBSs relied on a scarce and expensive resource in phone lines, with one phone line per person occupied while they used the system. This meant that there was a proliferation of measures to constrain access to this limited resource.
For example, many BBSs had some sort of "liveness" check built into them. This could be as simple as asking the user to press a key or enter some text every N minutes or after N minutes of activity. If the user failed the liveness check, the BBS could disconnect that session, and optionally block the user account or phone number from access.
Some of these liveness checks were miniature games, and asked humorous knowledge questions: in so doing they also prevented simple automation, as some BBS clients eventually implemented random periodic keystroke insertion to defeat this timeout measure.
Some BBSs also combined this strategy with additional security measures. For example, they often used multi-factor authentication during registration. This might include a direct humanity verification via voice, in which the system operator would call the number you provided to confirm 1) the number worked, 2) that a real person was answering, and 3) that the real person was aware of the registration.
Others would ask a series of questions during registration that were either automatically verified or sent to a reviewer prior to authorization. This prevented random "wardialers" who had discovered a non-public BBS from signing up.
Finally, due to the resource constraints described above, many BBSs had download limits or credit systems in place, which functioned as anti-content-scraping systems to prevent a new user from downloading every file available and tying up a phone line for days.
1996: Modern Humanity Verification in the Online Context
By 1994, the Web was starting to come into being. By 1996, more than 36 million people had access to the Internet. Many in academia and industry had been using online systems like email and FTP to communicate for decades, but the 1990s brought a much larger and less-moderated audience into the online sphere.
As in any group of people greater than 1, antisocial behaviors became commonplace online. This led to a resurgence of interest in applications of the Imitation Game (or "Turing test") to deter automated fraud and abuse.
For example, in 1996 Moni Naor wrote a paper entitled "Verification of a human in the loop, or Identification via the Turing Test" in which he outlines all commonly used challenge types still in use today.
We propose using a "Turing Test" in order to verify that a human is the one making a query to a service over the web. Thus, before a request is processed the user should answer as a challenge an instance of a problem chosen so that it is easy for humans to solve but the best known programs fail on a non-negligible fraction of the instances. We discuss several scenarios where such tests are desired and several potential sources for problems instances. We also discuss the application of this idea for combatting junk mail.
We now list a few areas that are a possible source for such problems. They are drawn from Vision and natural language processing.
Gender recognition - given a picture of a face determine whether it is a male or a female. Since there are only two possibilities the challenge should consist of, say, four pictures and the users should get all of them right.
Facial expression understanding - given a face decide whether it is happy or sad.
Find body parts - Benny Pinkas suggested that the challenge be a picture of, say, an animal and the user should click on its eye. The advantage over all other proposals here is that the number of possible answers is much larger. There should be of course some tolerance for the distance from the correct location.
Naive drawing understanding - given a drawing of, say, a house determine what it is from a list of five distinct possibilities. Dan Roth suggested adding "context" ,i.e. background, to the drawing - this will make it easier for people and harder for machines.
Handwriting understanding - given a handwritten word the user should type it. Again, it makes sense to add the kind of noise that people do not have a problem to ignore.
Speech recognition - the challenge is a recording of several words and the user should write them. Given progress in this area, selecting from several possibilities may be too easy; having the user write the result may be too demanding, since there are spelling errors etc.
Filling in words - Given a sentence where the subject has been deleted and a list of words, select one for the subject. Another possibility is to take a sentence and permute the order of the words. The challenge is to determine which of several possibilities is the original one.
Disambiguation - another problem from NLP (suggested by Dan Roth). The challenge is to figure out to what does "it" refer in a sentence like "The dog killed the cat. It was taken to the morgue."
-- "Verification of a human in the loop, or Identification via the Turing Test" by Moni Naor (1996)
Naor's paper serves as a nice exposition of the major considerations, but the core ideas were hardly new, being simply transpositions of much older ideas into the context of the web.
Nor was the recognition of their practical considerations unique to Naor. Indeed, engineers at online services such as Inktomi were starting to use these ideas commercially around the same time.
1997-2002: The Steam Engine Moment for Humanity Verification
As the popularity of online services continued to grow, more and more automated abuse naturally came along with this growth. Our companion piece "Why CAPTCHAs Will Be With Us Always" gives a bit of background as to why this was an obvious and natural occurrence.
For example, in 2000 Carnegie Mellon PhD student Luis von Ahn heard engineers from Yahoo describe their challenges with fake registrations, and coined the acronym CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart") to describe his implementation of one of the ideas laid out by Moni Naor half a decade earlier.
Many other implementations were also created or described in this period, using humanity verification for everything from anti-abuse to stopping payment fraud, denial of service attacks, and more.
However, the "CAPTCHA" acronym grew in popularity and eventually became a generic word, often used in the verb form ("that site captchas me too much").
2002-Now: The AI Arms Race
By the early 2000s, all of the core ideas developed in the 90s were in common practice, and AI researchers were working to defeat them.
Among other examples, Greg Mori and Jitendra Malik published work in 2002 on breaking Gimpy, the CAPTCHA test used at Yahoo! to screen out bots. Their method could successfully pass it 92% of the time, using general purpose algorithms designed for generic object recognition. It was also obvious how these approaches could be generalized. As they put it, "The same basic ideas have been applied to finding people in images, matching handwritten digits, and recognizing 3D objects."
Visual CAPTCHAs had also found their modern form. For example, the open source HumanAuth by GigoIt (2006) was already nearly identical to the UI used by reCAPTCHA v2 in 2014, and was itself inspired by the earlier KittenAuth, which focused on pictures of cute animals.
Many common security measures were also in standard practice by then. For example, the c. 2006 HumanAuth used session salting, dynamic image watermarking to break hashing attacks, and other approaches to make automation more difficult. None of these ideas were novel in 2006, but we should give its author due credit: it is too often the case with software that security measures like these that were common knowledge by the mid-90s and a near-universal best practice by the late 90s are not always followed by less conscientious developers.
Today, we have seen an explosion of progress in the AI field that has started to make many previously intractable problems solvable by computers.
Everything from OCR to object recognition to natural language processing is being targeted for automation, and the successes have piled up in recent years. For example, the original text-based OCR challenges used by reCAPTCHA v1 and many other systems can be solved at >98% accuracy with deep learning techniques.
This has made the level of investment and expertise required to achieve robust humanity verification higher and higher each year, and now only a small number of companies like hCaptcha are still able to deliver reliable solutions.
As we are fond of saying, "online security is one of the only problems in computer science that unsolves itself as soon as you make progress."
The research team behind hCaptcha has worked in machine learning for decades and publishes many papers each year in this field, including state-of-the-art work in visual domain ML, OCR, and other areas.
Much of this work makes its way back into hCaptcha and hCaptcha Enterprise products, and we are proud to help make the online world a bit safer from online spam and abuse by increasing the cost and difficulty of attacks.
We hope you enjoyed this brief history of humanity verification!
This topic is not only our work but our passion, and gaining a sense of connection to our forebears is always fascinating, especially in showing just how little has changed when it comes to human nature.
The ways in which we understand the world and engage with the possibilities of our creations and our own nature remain largely constant, even over thousands of years.
The future will be filled with challenges, but perhaps not so incomprehensible as we might think to our relatives of 3,000 years ago.
Notes: Passages cited are condensed for clarity, but are not otherwise altered. An illustration of Yen Shih's automaton (internal details) can be found in this blog post; the original author is unknown. If you are interested in the Western history of automata, we recommend this overview.
Subscribe to our newsletter
Stay up to date on the latest trends in cyber security. No spam, promise.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.