How do I block bad bots and allow good ones?

Hi,
we’re using the free plan of Cloudflare and are are facing a lot of bot traffic that makes our site unusable if not blocked. Most of the human traffic comes from Germany and most of the (bad) bot traffic comes from the US. If I block the US and allow good bots (cf.client.bot) to access our site or if I switch on Bot Fight Mode, Google Search Console can not crawl our site and shows a 403 error. I was looking in the forum and tried different things but I couldn’t find a solution. I was reading, I shouldn’t block the US and I should switch off Bot Fight Mode but if I do that, our site gets attacked by bots and is not usable anymore. So, using the free plan, is there a solution to fight bots from US and allow good bots like Google Bots?
Thanks so much!

Related posts:

It’s a large list:
https://developers.google.com/static/search/apis/ipranges/googlebot.json

Monitor the sitemap access for IPs and compare to the above list, those you don’t want connecting reveal their ASN and block that or if a pattern in range is seen perform a range block.

Should be fairly easy, but do be careful to not block actual users.

I’m not quite sure how to do this in practice. I tried to do exactly what you are saying but I accidentally blocked actual users that weren’t amused to be blocked from our site. Meanwhile Google is still not able to crawl our site and our search ranking is crashed. So the free plan of Cloudflare is not able to distinguish between bad and good bots, so what is the point in using it?

This should not be happening. Do you mind to post a screenshot of the rule you tried with Known-Bots ON that blocked Googlebot?

Thank for your answer! Here are the screenshots…

And the Google search console shows either 403 or 5xx errors Bildschirm­foto 2023-03-21 um 16.50.28

Bildschirm­foto 2023-03-21 um 16.50.52

Could you go to your Dashboard > Security > Events and search for an event where Googlebot was blocked? Then post back here the details, especially what Service (WAF, Rate Limiting etc.) and what rule is doing the block.

Thanks, that is really helpful! I did a live URL inspection in the search console and looked in the events if Google is blocked but I could find only Google bots that are not being blocked but skipped. But why can our site still not be crawled and showing a “Page cannot be indexed: Server error (5xx)” error?

Have you checked what URLs Google is saying it’s not being allowed to crawl? I ask because Google and other search engines will not only crawl URLs you submit, but sometimes URLs you have no idea where they come from. Some of them may even be dangerous. Please see this similar topic, where I explain my opinion that even good bots should not be allowed to do whatever they want in your websites.

1 Like

Interesting! Google tries to index one specific page form our site which is totally ok and reachable but two pages that have nothing to do with our site! Strange, how can I change what they index? And should I add the rule to my firewall?

1 Like

Also, if I try to submit a sitemap, Google shows either a 520 error or is prompting that the sitemap is submitted successfully…

Bildschirm­foto 2023-03-21 um 17.28.35

Thank you very much for your help!

1 Like

Not sure if you still have the issue. But if so checkout google URL Inspection Tool

Google has started indexing pages! Thank you very much for your help!

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.