Block robot identified by bot\*

(Bot back slash * )
Do you know how to block a bot with a back slash. i can see it in my WP CP Awstats as following:
Unknown robot identified by bot\*

Will awstats show you the complete request? You could try escaping it with another backslash, but it’d be nice to see the actual request so you can create a more accurate Firewall Rule.

Cloudflare is interpreting the backslash as double \\ so if add one it will be triple
what does the complete request look like ? i will double check but i dont remember there was any more helpful info

1 Like

Could you please help me blocking all BOTs except the ones i want.
I tried the following but my whole website was blocked
(http.request.uri.path eq “/robots.txt” and http.user_agent ne “Googlebot”) or (http.user_agent ne “Bingbot”) or (http.user_agent ne “AdsBot-Google”) or (http.user_agent ne “facebookexternalhit”)

Truth is, most of your traffic will be bots. From all over. If you’re on a Paid Plan, you can experiment with Super Bot Fight Mode. Otherwise, you’re going to have to comb through your logs and figure out what is good traffic, and what’s bad. Which country, which ASNs, etc.

1 Like

what is wrong with the rule i used? why it blocked my whole site?
i tried rule: (http.request.uri.path eq “/robots.txt” and http.user_agent ne “Googlebot”)
it worked but when i added another exception. like the following:
(http.request.uri.path eq “/robots.txt” and http.user_agent ne “Googlebot”) or (http.user_agent ne “Bingbot”)
the whole websites gets blocked !!!

It’s the OR user agent does not equal Facebook. That’s going to block everybody who’s not Facebook.

There’s not much point tying a rule to robots.txt because bad bots don’t care about robots.txt

3 Likes

Isn’t that “paranoia” trigger related to mod_security and cPanel/Web server?

Maybe you want to block any “bot” which contains “bot” in a User-agent which would block the ahrefsbot and mj12bot for example, but allow only Googlebot and Bingbot to crawl your Website?
If so, maybe below Firewall rule could help you? - could be I am wrong about this one:

(http.user_agent contains "bot" and not http.user_agent contains "Googlebot" and not http.user_agent contains "bingbot")

Oh, then that explains why OP put a backslash in front of the asterisk. But it doesn’t explain where the bot* comes from.

1 Like

Blockquote There’s not much point tying a rule to robots.txt because bad bots don’t care about robots.txt

So you mean that BOT is not just hitting the robots.txt … ok i just noticed this in the Awstats

Blockquote Numbers after + are successful hits on “robots.txt” files

Unknown robot identified by bot\* 9,106 210.76 MB

that BOT is hitting more than 9k hits

Hm, maybe that’s “general all for one” rule name which got tirggered as bot\* group, like maybe the User-agent was empty?

May I ask what does WP CP Awstats refer to?
Is the Awstats at the server or some plugin for, WordPress I guess?
Do you actually see something at Cloudflare dashboard or somehow it bypasses it (regarding your Cloudflare options being enabled and other security settings)?

Numbers after + are successful hits on “robots.txt” files
so hits without the + are hits not on the robots.txt
Unknown robot identified by bot\* 9,106 210.76 MB

If you want only Googlebot to access your robots.txt file, and crawl your Website by reading a the line wher Sitemap is defined, then use the Firewall rule from below with action “block”:

(http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot")

And yes, using this one, you will see Bingbot and other bots shown up at Firewall overview tab as being triggered and being successfully blocked that way.

WP CP i mean WordPress Control panel
yes Awstats on the server side , not using any plugins

to see on cloudflare i need to put some code in one my site html pages. I didnt do it and just used the Awstats

I am sorry, but I am not familiar with this one.

Okay.
May I also ask if your domain is at Cloudflare?
Moreover, are the DNS records :orange: cloud (proxied via Cloudflare) at Cloudflare DNS tab too or :grey: cloud (DNS only)?

In that case, you may also need to allow Cloudflare to connect to your origin host/server too.

it works now. as sdayman, the problem was the or … i used and … and it works fine now

but the problem is :grin: i figured that the aggressive BOT is not hitting the robots.txt. So i need to block it from accessing the domain which i couldnt Unknown robot identified by bot\*

1 Like

sure the DNS is on cloudflare proxied

the problem is :grin: i figured that the aggressive BOT is not hitting the robots.txt. So i need to block it from accessing the domain which i couldnt Unknown robot identified by bot\*

Ok maybe i found a workaround … i did it this way. did not specify a page, all user agents contains bot except ones i will add


I ll keep an eye and see

may i ask you you what is AS num 15169 ? why do you exclude it ?