Firewall rule not working properly

Hi,

I created a firewall rule that forces every user to complete the captcha except if their user-agent does not contain words like “google”, “bing”, “yandex” etc… (search engines).

In my exemple (image below) i wrote in the rule that if the user agent does contain the string “ahrefs”, then it should not trigger the captcha challenge.

However as you can see the rule doesn’t seem to work at all, all search engines like Google, Bing, Semrush etc… (which should be whitelisted according to my rule) trigger the captcha challenge whereas they shouldn’t.

Thanks for helping me

The logic of the expression is incorrect.

If the UA contains “ahrefs” then it will match all of the other NOT expressions, and get challenged. You want “and” instead of “or”.

There is specific guidance on how to avoid blocking known good bots.

1 Like

Thanks very much for your answer.

But using the operator “AND” instead of “OR”, wouldn’t that mean that all the conditions need to be satisfied?

For exemple if i choose :

if (UA does not contain “Google”) AND if (UA does not contain “Bing”) => Captcha challgenge

That would mean that the UA should contain both “google” and “bing” in the logs, which is actually impossible because it would only contain “google” (if it’s the google bot) or “bing” if it’s the bing bot.

hope it makes sense

No. Using AND would mean that if any of the conditions are met, it won’t trigger the block.

However, as @michael has said, there’s a better and safer way to exclude known bots from your rule. You should read the documentation and implement the known bots variable instead.

User agents are very easy to forge and anyone could bypass your firewall rule by simply pretending to be Google or Yandex etc.

Let’s break down the logic.

(UA does not contain “Google”) could be written “The request was not made by Googlebot”

  • If a request comes in from Googlebot this will evaluate is false, because the request was in fact made by Googlebot.

  • If a request comes from BingBot the expression will be true, because the request was not made by Googlebot.

  • If a request comes from a Browser (or something else that does not match any of your expressions) then the expression will again be true, because the request was not made by Googlebot.

Standard Boolean logic is:

  • A AND B is true if both A and B are true.
  • A OR B is true if either A or B are true (including if both A and B are true.)

So with a few examples:

Request made by Googlebot:
(UA does not contain “Google”) ==> false
(UA does not contain “Bing”) ==> true
Result with OR: true (challenged)
Result with AND: false (not challenged)

Request made by Bingbot:
(UA does not contain “Google”) ==> true
(UA does not contain “Bing”) ==> false
Result with OR: true (challenged)
Result with AND: false (not challenged)

Request made by Browser:
(UA does not contain “Google”) ==> true
(UA does not contain “Bing”) ==> true
Result with OR: true (challenged)
Result with AND: true (challenged)

The only scenario where OR will evaluate as false would be where you had a bot that could appear to be every bot you have defined, which we can discount as being impossible. In fact, such a bot is 100% fake, but is the only thing which will be bypassed by your expression!

1 Like

thanks very much for your help, i’ve also used the cf.client.bot to make things easier with known bots.

This topic was automatically closed after 30 days. New replies are no longer allowed.