Why CAPTCHAs Will Be With Us Always

Intro

Every system we create simply reflects our relationships with other people.

What we desire from them and ourselves drives our behavior, and thus the same patterns of interaction recur in all human systems. The internet provides many sterling examples of this fundamental aspect of our world.

For example: spam, fraud, and abuse have existed online since nearly the day after two systems were first connected together.

Online abuse stems from innate human behaviors

The first known spam email was sent in 1978. [1] Almost five decades later, email spam continues to plague us. Many billions of dollars and thousands of person-years of engineering effort have been put into defeating it, and yet spam persists. There are billion dollar industries on both sides, i.e. those defending against it and those sending it.

Why is this? The answer is simple. So long as there is an incentive to abuse a system, someone will generally find a way. And once they do, someone else will now have an incentive to stop them.

The balance may shift slightly from year to year, but this behavior simply reflects fundamental human nature, and five thousand years of recorded history imply the dynamic is unlikely to change any time soon.

Hope Springs Eternal

This analysis is not new, but people are eternally optimistic. Especially when they think their idea is new.

The same ideas are proposed so frequently that about 20 years ago, an anonymous author came up with a helpful fill-in-the-blank for online discussions about anti-spam solutions:

Why your new anti-spam solution won’t work. Unknown author, c. 2000

Humanity verification is the same problem

Just as there will always be people who want to get your attention via email, there will always be the desire to constrain access to some resources such that only humans can access them.

Whether the goal is to prevent:

- automated mass purchases during a limited release sale
- software from downloading all the content off your website en masse
- having your web forms filled out with endless marketing spam or abuse
- credential stuffing and other account takeover attempts

among many other uses, being able to filter online traffic on the basis of “human or not” will likely remain a valuable tool so long as human nature follows the same patterns recorded over the last five thousand years.

What about machine learning?

In 2021 simple CAPTCHAs can sometimes be solved with software, thanks to advances in machine learning. To address this, hCaptcha also does a very detailed analysis of properties like how the challenge was answered, not just whether it was correct, and frequently changes both the questions and the data used.

Because many of the questions asked cannot be economically solved with off the shelf software, and because the data is generally unique to the system, this means that bots are often completely stopped by the free service, and must rely on human CAPTCHA solvers instead, which makes many kinds of abuse completely uneconomic.

hCaptcha Enterprise goes much further than this, moving into the realm of attempting to distinguish “good human” from “bad human” as well, and is used today by many of the world’s most popular online services to find and stop everything from purchase fraud to advanced persistent threats against their users, while maintaining a unique focus on preserving privacy.

What about hardware attestation?

Various methods of linking identity to a device in a cryptographically secure fashion, sometimes with privacy-preserving properties, have been proposed for decades. Public key cryptography dates back to c. 1975, and hardware tokens have existed nearly as long.

Unfortunately, controlling a piece of hardware does not mean you are a person. Virtually every popular consumer hardware attestation scheme has been repeatedly broken, patched, and then broken again.

Malicious abuse of these flaws is often discovered to have been occurring for months or years prior to disclosure or academic publication.

A wall of more than 10,000 phones used for abuse, part of a Chinese bot operation.

A wall of more than 10,000 phones used for abuse, part of a Chinese bot operation.

No matter how reliable your cryptographic scheme, if someone can at the end of the day simply spend money to give you the answer you are looking for, owning a piece of hardware is insufficient.

That said, cryptography is quite a young discipline. Based on recent history, your cryptographic scheme and/or implementation is likely to be broken as soon as anyone has an incentive to look at it closely, and it is likely other people will figure this out long before you do. Relying on hardware also means you may need to ask every single one of your users to change a physical device in order to patch the flaw. This is unlikely to happen quickly in most cases, meaning in reality your system will simply fail open.

This is why defense is depth is important: hCaptcha uses multiple different approaches to answer the same fundamental question, allowing comparison for consistency across all evaluations.

What about passive or no-challenge solutions?

Services that attempt to do bot detection with purely passive signals rapidly run into a fundamental issue: how do you validate whether a system detects bad actors correctly when you don't have accurate ground truth?

The open internet is a very noisy environment. Bad actors attack users of our service and competitors like reCAPTCHA every day, and are of course attempting to look as human as possible while doing so.

Purely passive services struggle to maintain bot detection accuracy, or even to know when they are inaccurate. Without the ability to occasionally challenge users and correctly analyze the results of that challenge, accuracy tends to decline greatly over time.

Thanks to the more accurate humanity signals that power hCaptcha, many companies that previously used reCAPTCHA v3/Enterprise have switched to hCaptcha Enterprise and seen dramatically improved accuracy in hCaptcha Enterprise's purely passive mode as well.

We have been able to observe this directly, as some of our customers run A/B tests on the same request before they switch over entirely from Google's offerings. We are typically able to prove both a high false positive and false negative rate coming from reCAPTCHA v3 or reCAPTCHA Enterprise.

This is unsurprising given the inaccuracy of the feedback loops available in those products, and the fact that they are sold by an ad network with an overwhelming vested interest in not getting too good at bot detection, lest it harm their core business.

In conclusion

So long as people remain people, it is likely that humanity verification will have a role to play online.

Our job at hCaptcha is to find good tradeoffs between difficulty and accuracy and to keep friction low, especially for accessibility users. In the end, interacting with challenges is likely to remain part of the arsenal of tools for reducing online abuse so long as human nature remains unchanged.

Building a service that does this well while balancing all concerns is a very hard problem (and we are always working to improve it so as to make the experience as pleasant as possible) but we hope you will agree that reducing spam, abuse, account takeovers, and online fraud is ultimately well worth the occasional simple question.