HCaptcha proven by research to be completely useless

HCaptcha has been shown by research to be completely useless in terms of preventing bots:

" We design and develop a low-cost, end-to-end system to break hCaptcha service.
• We evaluate our system against 270 live hCaptcha challenges and achieve the success rate of attack over 95% with the system taking less than 19 seconds to crack a challenge on average.
• We provide a preliminary security analysis of the hCaptcha system. Our analysis shows that the hCaptcha service employs minimal to no mechanism to resist automated abuses other than asking users to solve a simple image recognition task."

In the light of this information can we now please switch to a more mature captcha technology, that isn’t broken 90% of the time, obnoxious, bad user experience AND completely ineffective at the only job it had?
There are TONS of threads about how terrible and immature hCAPTCHA is, but Cloudflare just WON’T listen to ANY of the feedback…

The “best” part:

“We reported our attack and countermeasures to the hCaptcha security team to help them make the system more robust to automated attacks. They responded that their system would have been pretty confident that our traffic was automated based on the techniques we used, and we would never have observed additional countermeasures. However, we did not notice any measures preventing our bot from passing the image CAPTCHA tests during our experiment.”

If this is not gaslighting 101 on hCaptcha’s part, then I don’t know what is. I don’t know how a company that behaves like this can be trusted at all. Their only selling point has been “we’re not evil google” so far, and it’s pretty unconvincing when these are their own strategies.

Our system is indeed designed not to leak detections in real-time. By contrast, with reCAPTCHA you can simply sign up and get a bot score, which makes it trivial to break.

This limits options for the free version they tested, as by design it will not completely prevent all detected automation from passing.

Instead, one of the tools it relies on is frequently changing the classes and types of challenges. However, it also has “anti-drain” protections to avoid leaking these.

Thus, our response to them after looking through the paper was that in fact the anti-drain protections were working as designed, based on the other details reported.

1 Like

Interesting article. As an outside observer of a number of scraping tools and communities, I must say the combination of hCaptcha and Cloudflare’s own bot mitigation tools have been :popcorn: worthy.

We continue to collaborate with hCaptcha and other security teams to find new and exciting ways to identify legitimate users/requests and requests from sources customers don’t want with a minimum of fuss on the part of real users.

I must say that overall my experience with the customers I support has been overwhelmingly positive with hCaptcha. YMMV of course. :popcorn:

1 Like

Some one on github wrote a hCaptcha solver in 1 screen of code that worked in 2018-2020??? but I never tried it myself. 1 out of 20 or 1 out of 100 100% random click patterns always succeed with hCaptcha. With basic automation, and no AI, 20 to 40 POST requests later you win. CF revised their hCaptcha bot challenge in fall 2020 to always include a JS challenge/wait 5 seconds heavily obfuscated/encrypted massive JS blob, and CF doesn’t use the hCaptcha service standalone unless its new accounts/login window.

Furthermore, we submitted the same num-
ber of challenges using Selenium WebDriver for Firefox
as well. Selenium is the most popular web automation
software. We analyzed the results for each experimental
setting to identify any discrepancies among these different
settings. However, we did not notice any distinct pattern
that can distinguish the settings. For example, we came
across the same nine image categories, achieved similar
accuracy (over 90%) in all experimental settings. Further,
none of the requests were blocked in any of the experimental
settings. Our analysis indicates that hCaptcha solely relies
on correct image selections to verify a solution without
adapting challenges based on users’ threat levels.

That is correct, if you poke around hCaptcha’s JS code, there are no attempts detect a headless browser other than touch/mouse x,y coords included with the POST req. hCaptcha was written to be as lightweight on CF/HC servers, and lightweight on clients, Pentium Classic or Pentium II lightweight. hCaptcha’s server side API I think also lets the dev get the client’s IP in JSON, or pass the client’s IP to hCaptcha to use in a “bot score”. Anti-mechanical turk.

recaptcha is maximum AI resistant, sepia colored traffic light picking (Autonomous vehicle training/street view), but recaptcha has a “as designed, bounty rejected” flaw as big as hcaptcha. Just login into your google account, recaptcha will always 100% 1 click pass :smiling_imp:

Make a couple gmail accounts, my oh my, recaptcha let through all the bots (you still probably need a legit Chrome/FF process, since recaptcha does a JS challenge near identical to CF’s JS challenge).

For dec 20-feb 21 I found WAF hcaptcha impossible to solve, there was a minimum time between clicks, if you solve it accurate in less than 15 seconds you failed always. March 2021, its back to summer 2020 easyness. hcaptcha always lets through 1 or 2 bad tiles, and some of their images are truly never know if its windshield of a boat or a truck. hcaptcha has a google accounts style “login” feature, give hcaptcha your email address, verify its real, you get 10 free image solves an hour (or a day) if you really are handicapped. hcaptcha hadicapped cookie and CF WAF (403/429) always fails. Not that easy :stuck_out_tongue:

CF/HC always said the handicapped feature isn’t an exploit, it is as designed, after a japanese blogger wrote it up as an exploit. recaptcha’s audio challenges were broke far more easily by FOSS AI or I think they used IBM SaaS audio transcription library. NYTimes squiggly text was also broken as many of images were presented to users with no history decoded text (any string passes). The google dashcam captchas have never been broken since no machine vision/autonomous vehicle software companies will never rent/sell their source code to any non-auto industry customer. If someone wants to correct this history article, feel free.

Image Repetition. We found that hCaptcha often repeats
images across different challenges. We computed the MD5
hashes of 48330 images collected from the hCaptcha chal-
lenges during our analysis and identified 9854 redundant
images belonging to 1985 sets of identical images. Cryp-
tographic hash functions such as MD5 may not provide
an accurate number of repeated images since the slightest
modification in the input will produce a drastic change
in the output. As a result, we used the perceptual image
hash (pHash) [21] algorithm to find similar or completely
identical images in the submitted challenges. Interestingly,
we found the same 1985 set of images in our phash analysis
as well. That means while repeating the same image to
multiple challenges, hCaptcha makes no attempt to modify
the image and gives exact copies of it

I didnt read the article to the end before writing my last post, but everyone knew that for long time. If I saved all my clicks on a certain site, over a few weeks I would solved the entire HC image set. Only 1900 unique images heh. So low.

I’d guess CF bought hCaptcha since it was the lowest CPU per challenge served provider on the market and CF designs products to survive a Layer 7 DDOS attack for only free or $20 a month. hacked IOT devices asking for millions or billions of HC images to be distorted and coughed up (and never rendered or TCP RST FIN drop after 1 packet of jpeg body) would just cause a CF global network outage as all cores on all racks max out in CPU. CF doesn’t sell CPU or GPGPU services (Nvidia ASICs) so creating billions of uncached distorted images would not be financially smart. hcaptcha is just XORed image jpeg paths rewriten by a cloudflare worker, and the images live in cache API as any other vanilla site. The POST form probably includes the AES XORed solve answer so there is no server side “State” saved between challenge present and challenge answer verification which would be handled by 2 different workers. So layer 7 DDOSing hcaptcha wouldnt work, it would just be innocent generic CDN traffic. Only thing would happen is some Tier 1 ISP would get a call saying the 10GE link in city XXX been saturated for an hour, can we get a 25GE or 100GE port to you? The edge servers wouldn’t run out of CPU.

Using a deep learning network to solve this sort of challenges is almost like cheating, especially when Recaptcha is as vulnerable to this attack.
There are alternatives that make it way more challenging to the attacker, however, they also affect the user experience.

In general, image recognition challenges are defeated to an extent,other trained network will always be able to solve those challenges. Vendors need to think of an alternative to make ML attacks more challenging without hurting the user experience such as the image showed above.