Cloudflare passes bots through the hood at high load

I’ve set up the right configuration. Two basic firewall rules. The first rule allows all users to log in from yandex and google search engines regardless of the visitor’s IP.
The second rule of the firewall sends visitors who are on the black list of IP addresses.
On a site with a low traffic load, everything works correctly and all bots are blocked.
On a site with a high traffic load, this network configuration of the firewall fails and passes part of the bots through the trap. In the analysis of the firewall, you can see that all bots with a specific referer and IP in the list fall on the cap, but some of them are skipped by the firewall, although it should be blocked. Apparently cloudflare with a large load of such a configuration fails. How can this be solved and eliminated by bypassing the hood with bots?

The configuration contains 2 basic rules. The first rule skips all visitors who contain a yandex or google referee. The second rule filters bots that are in the IP blacklist and contain an empty referer, that is, they try to enter directly. The same configuration on different sites works differently. Where there is a small load of connections, all bots are blocked on the cap. Where a large load (about 40 thousand requests per day), part of the bots circumvents the challenge captcha, although in the log of the firewall it is clear that they hit the trap, but then somehow bypassed it.
I suspect that cloudflare in this configuration does not have time to process a large number of bot requests at the same time.
Is it possible to fix this and is it possible to somehow analyze this situation with cloudflare specialists and what is needed for this?

There shouldn’t any load limitations on the Firewall. Seems more likely that the Bot requests contained a referrer and bypassed your rule.

Although BOTs do often go direct, it’s trivial to add a header to the request.

Can you share logs showing these requests ?

If only Firewall Events Log would show the referer. That would make many people extremely happy. Do you know why that field isn’t included?

2 Likes

The referer field is not always set by browsers, sometimes for privacy or by individual settings.

How is the rule setup? That combination of AND / OR doesn’t work well through the builder and you should create those using the editor instead - in the format ((A1 AND A2) OR (B1 AND B3) OR …) - using the () to make sure they are evaluated in the order you want.

I did not set any download restrictions. Where can I check it?

Expression Preview


Edit expression

(http.referer eq “” and ip.src in $ip_filter) or (http.referer contains “instagram” and ip.src in $ip_filter) or (http.referer contains “vk” and ip.src in $ip_filter) or (http.referer contains “facebook” and ip.src in $ip_filter) or (http.referer contains “mail” and ip.src in $ip_filter) or (http.referer contains “t.co” and ip.src in $ip_filter) or (http.referer contains “ok” and ip.src in $ip_filter) or (http.referer contains “youtube” and ip.src in $ip_filter) or (http.referer contains “twitter” and not ip.src in $ip_filter) or (http.referer eq “zen.yandex.ru” and ip.src in $ip_filter)

byazrov.ru 178.176.79.164 - - [20/May/2021:15:38:53 +0300] “GET / HTTP/1.1” 200 41888 “https://byazrov.ru/” “Mozilla/5.0 (iPhone; CPU iPhone OS 13_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1” 13615 13333:0

This fragment of the server log with the bot input, which punched through the captcha challenge without passing it. Attached is a full log file where you can find this entry as well as export the query file of this to cloudFlare
This bot fell under the rule of cloudFlare, then was sent to the captcha challenge. After that, he somehow bypassed captcha without solving it and entered the site with a direct entrance. I gave the direct input of this bot in red in the metric. You can see this login in the server log. Many of the same bots are successfully blocked cloudFlare through the captcha challenge and bots do not pass it. But a small part of the bots somehow breaks through the captcha challenge and still goes to the site without directly solving the capcha

The more bots attempt load, the more bots break through this rule cloudFlare captcha. Somehow, the load in the form of the number of attempts to connect bots on the captcha challenge plays a role in this rule. The rule directs all bots with a specific reference to the captcha challenge. If I change the settings of the captcha challenge firewall rules to block, then all bots are successfully blocked completely. Incorrect processing of some bots occurs only if it is not a block, namely the captcha challenge



https://disk.yandex.ru/d/ZjtoYSiyCdrc_A
https://disk.yandex.ru/d/M_R0v6ywLAzGCw

Interesting. The user-agent seems innocuous enough - are you sure it’s a bot and not a human coming through a link posted in one of the sites in the referrals?

What’s the quality of those IP in $ip_filter? Are they confirmed bot/spammers only - no chance of humans coming through those addresses?

With unconditional filtering of all users on the IP blacklist, the bots are completely blocked by 100%. But this option leads to the loss of real customers-visitors who do not want to go through the hood.
Therefore, I have made a more complex firewall rule, which allows all users from the search engines Yandex and Google to go to the site, and gives kapchu only to those users who log in directly or from social networks to the site. But with such a complex firewall rule in the work of the captcha challenge, I see some hole through which the bots somehow punch through the defense without passing the drip. In general, these bots of this captcha challenge are not able to pass as a person.
Bots with whom I fight know how to fake referees, know how to bypass google captcha v2, v3, know how to bypass JS challenge. The only thing they can’t get around is the captcha challenge, but only with certain firewall rules.

With this complex firewall rule, which I have now set up, about 1-5% of bots punch the cap without passing it

In screenshot 111, a firewall rule in which 100% of bots are blocked and bots do not punch the cap.
On screenshot 222 and 333, a more complex firewall rule is used, in which 1-5% of bots punch the cap through without solving it. This happens if the action of the firewall rule is set to captcha challenge. If I put a full block instead of the captcha challenge, then all 100% of bots are blocked and none of the bots goes to the site. I have already tested this on various sites that have been attacked by these behavioral bots.



Why is the Twitter referrral test different from the others - using “not in $ip_filter”?

That’s my mistake. I corrected

(http.referer eq “” and ip.src in $ip_filter) or (http.referer contains “instagram” and ip.src in $ip_filter) or (http.referer contains “vk” and ip.src in $ip_filter) or (http.referer contains “facebook” and ip.src in $ip_filter) or (http.referer contains “mail” and ip.src in $ip_filter) or (http.referer contains “t.co” and ip.src in $ip_filter) or (http.referer contains “ok” and ip.src in $ip_filter) or (http.referer contains “youtube” and ip.src in $ip_filter) or (http.referer contains “twitter” and ip.src in $ip_filter) or (http.referer eq “zen.yandex.ru” and ip.src in $ip_filter)

After the correction is the rule blocking correctly now?

no
no
no

This topic was automatically closed after 27 days. New replies are no longer allowed.