Block all "bot" happy with the result but still one can bypass

I have this rule which is supposed to block all agent with “bot”
(http.user_agent contains “bot” and (ip.geoip.asnum ne 15169 and http.user_agent ne “Googlebot/2.1”) and (not http.user_agent contains “compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm”))

I still see this bypassing the rule
Mozilla/5.0 (compatible; Riverbot/1.0; +http://www.useriver.com/bot.html)

Did I miss something in the rule ?

Check the IP address to see if it’s coming from ASN15169

@sdayman
yes … AS15169 GOOGLE :wink: So what ?
Do you mean it is because I allowed googleBOT. But what is this Riverbot/1.0; +http://www.useriver.com/bot.html

(http.user_agent contains "bot" and not cf.client.bot)

is probably a more sensible approach.

1 Like

You’re blocking “bot” if it’s Not Equal to 15169…except it is coming from 15169.

@sdayman but it is combined with “and”. I thought this means 15169 + Googlebot/2.1
(ip.geoip.asnum ne 15169 and http.user_agent ne “Googlebot/2.1”)

Because if i only use Googlebot/2.1 , afraid could receive fake googleBOTs

Could that be enhanced more ?

Nope. If it’s “bot” AND it’s NOT 15169, you’re blocking. But it IS 15169, so it won’t get blocked.

It’s best to start with Sandro’s logic and add from there.

@sandro but those are too many . I dont need them.

:wink: how about this
and (ne (ip.geoip.asnum 15169 and http.user_agent “Googlebot/2.1”))

But if you add NOT user agent contains Googlebot/Bingbot, it should work.

while my goal is to block all except googIe and bing. I cant see the benefit of not cf.client.bot as I already blocking all “bot” without excluding CF known BOTs

is there a way to say:
block all “bot” except (the user agent googlebot that is coming from asnum 15169 )

I tried modifying this: (ip.geoip.asnum ne 15169 and http.user_agent ne “Googlebot/2.1”)
but the expression builder couldnt read it

Thank you guys, I was thinking what @sandro said
I ll try something like this
(http.user_agent contains “bot” and (not http.user_agent contains “Googlebot” and not http.user_agent contains “bingbot”) and not cf.client.bot)

A post was merged into an existing topic: Domain .GA name not resolving in some countries (DNS_PROBE_FINISHED_NXDOMAIN)

@sdayman

Blockquote But if you add NOT user agent contains Googlebot/Bingbot, it should work

Do you mean like this
(http.user_agent contains “bot” and not cf.client.bot and http.user_agent ne “Googlebot”)

You shouldn’t have to mention Googlebot. (yes, I know I said that earlier, but it’s not necessary)

If the User Agent contains the word “bot” and it’s not a Known Bot, then block it. The good bots will still get through. Here’s what I’d do:

I added Yandex because it’s a Known Bot, but you want to block it. You’d have to add a bunch of And rules for each one you don’t want visiting your site.

Known Bots is pretty accurate, so it’s better than trying any other Googlebot or Bingbot detection.

Or…go back to your original rule and add another Or statement for user agent that contains “riverbot”.

@sdayman
got it, thank you for that explanation.

I was thinking this (http.user_agent contains “bot” and (ip.geoip.asnum ne 15169 and not http.user_agent “Googlebot”))
means any request coming from IP that is not a part of Google ASN and it’s user-agent string does not contain “googlebot” will be blocked

I ll go with the known bots method then add up. But i can only find 19 bots here

is that all :slightly_smiling_face: … could be easy if yes. but they say this is a sample.

That list isn’t always up to date. That’s why I keep an eye on my logs for unwanted crawlers.

I’m assuming that any bot on that list will respect Robots.txt, so you can edit that file to only allow Google and Bing.

@sdayman
what do you think about this
(http.user_agent contains “bot” and not cf.client.bot) or (http.user_agent contains “bot” and not http.user_agent contains “Googlebot” and not http.user_agent contains “bingbot”)

First part will filter out fake google and bing … then second part will filter out the rest of CF known BOTs that contains “bot”
this is similar to your approach but with all that contains “bot” in one line. if i am correct

1 Like