I’ve just found that I’m receiving tons of hits per minute from Googlebot, Bingbot, Yandex bots, AhrefsBot, Applebot…
I’m only interested in the bots of the most important search engines (Google, Bing), and would like to limit the traffic of the rest. I’m aware about the existence of the ‘Crawl-delay’ directive for ‘robots.txt’, but I guess that not all the bots will respect it.
I’ve been browsing the possibilities of limiting the bots’ traffic with Cloudflare (Dashboard > Firewall > Tools), but I’m not sure about the best option to do it.
Have you got “Bot fight management” enabled in Cloudflare dashboard?
Are you using any firewall rules or page rules?
How about Security Level in your Cloudflare dashboard?
Thank you very much for your nice answer. I’ve just implemented the rules you suggested.
In just 5 minutes, CF
allowed 1.47k events from Google
allowed 900 events from Microsoft
blocked 1.18k from Facebook: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Is it worthwhile to allow visits from the Facebook’s bot? I guess that these visits are used to crawl my pages’ OpenGraph in order to show a summary of their content to FB’s users, but I don’t know if my server’s effort is worth it.
By default the crawl rate was established at “Let Google optimize for my site (recommended)”, but if I selected “Limit Google’s maximum crawl rate” the selector was placed at the position of 3.5 requests per second.
I’ve just moved it manually to 0.7 requests per second.
Thank you for your nice answer.
Edit: Since I started with the Firewall Rules, Googlebot visited my website 16.5k times.
I’m seeing that I’m still receiving tons of requests by ‘SEMrushBot’. I’ve included it within my ‘robots.txt’ and handled it though PHP to detect their user-agent to block it.
Why ‘SEMrushBot’ is not flagged with ‘cf.client.bot’? It should have been blocked, right?