Hi,
we’re using the free plan of Cloudflare and are are facing a lot of bot traffic that makes our site unusable if not blocked. Most of the human traffic comes from Germany and most of the (bad) bot traffic comes from the US. If I block the US and allow good bots (cf.client.bot) to access our site or if I switch on Bot Fight Mode, Google Search Console can not crawl our site and shows a 403 error. I was looking in the forum and tried different things but I couldn’t find a solution. I was reading, I shouldn’t block the US and I should switch off Bot Fight Mode but if I do that, our site gets attacked by bots and is not usable anymore. So, using the free plan, is there a solution to fight bots from US and allow good bots like Google Bots?
Thanks so much!
Monitor the sitemap access for IPs and compare to the above list, those you don’t want connecting reveal their ASN and block that or if a pattern in range is seen perform a range block.
Should be fairly easy, but do be careful to not block actual users.
I’m not quite sure how to do this in practice. I tried to do exactly what you are saying but I accidentally blocked actual users that weren’t amused to be blocked from our site. Meanwhile Google is still not able to crawl our site and our search ranking is crashed. So the free plan of Cloudflare is not able to distinguish between bad and good bots, so what is the point in using it?
Could you go to your Dashboard > Security > Events and search for an event where Googlebot was blocked? Then post back here the details, especially what Service (WAF, Rate Limiting etc.) and what rule is doing the block.
Thanks, that is really helpful! I did a live URL inspection in the search console and looked in the events if Google is blocked but I could find only Google bots that are not being blocked but skipped. But why can our site still not be crawled and showing a “Page cannot be indexed: Server error (5xx)” error?
Have you checked what URLs Google is saying it’s not being allowed to crawl? I ask because Google and other search engines will not only crawl URLs you submit, but sometimes URLs you have no idea where they come from. Some of them may even be dangerous. Please see this similar topic, where I explain my opinion that even good bots should not be allowed to do whatever they want in your websites.
Interesting! Google tries to index one specific page form our site which is totally ok and reachable but two pages that have nothing to do with our site! Strange, how can I change what they index? And should I add the rule to my firewall?