Rate Limiting: Allowing good bots (e.g. googlebot) similar to Firewall Rules. Possible?

firewall

#1

Per the subject, is it possible to enable rate limiting but specifically ALLOW good bots?

Firewall Rules of course have Known Bots and Threat Score to accomplish this. As far as I can tell there is no similar option using Rate Limiting. Thus, if I set a rate limit at x per minute, and googlebot is crawling my site at x+1 I’ll inadvertently block googlebot, correct?

Here’s my use case:
I have someone trying to scrape a large static site I run (which is ironic as the site is hosted by Github Pages so they could download the entire site in one click ¯_(ツ)_/¯). This is pushing me up against free tier allotments on certain third party APIs (which I use to pull in dynamic data). I can continue to play whack-a-mole by adding a new Firewall Rule when the scraper changes IP address, but I’d much rather just pay CloudFlare via Rate Limiting to do this automatically.

Of course, I still have to consider (I believe?) that cached assets (including my bundle.js where the API calls exists) are not captured by Rate Limiting nor Firewall Rules.

I didn’t see this in the docs (including paid tiers) so if this is a Feature Request feel free to adjust the topic and tags.

Thanks!


#2

Have a look at this post:

It may not be the exact rule you are looking for but shows the ‘And not a bot’ part of a rule which appears to apply to your situation.


#3

Thanks for the link @domjh. That’s actually what I’m doing today (the ‘whack-a-mole’ case)…adding new IP ranges to a Firewall Rule as a new IP addresses appear.

My hope is there’s a way to do this automatically (i.e. handle the changing scraper IP addresses) via the Rate Limiting feature (and/or some other Cloudflare offering).

Of course, the simplest option may be a client-side script giving the scraper a link to download the entire site as a zip from Github :smile:


closed #4

This topic was automatically closed after 30 days. New replies are no longer allowed.