Rate Limiting by User Agent

Referencing the following community post: Rate Limiting by User Agent Anomaly

We are an Enterprise customer currently utilizing global rate limiting rules for [edit: our website’s] navigation and product details pages or our existing account.

We are in a similar situation as the OP from Nov 2020. For almost a year now we’ve been seeing large traffic spikes from an apparent white listed crawler - facebookcatalog/1.0. I can find no facebook sanctioned documentation about this crawler and we’ve escalated several tickets with them in an attempt to get Facebook to dial down these spike requests/throttle it - they seem to occur at regular 24 hour intervals. A low level number of requests do happen continuously however. These spikes usually double or triple our normal traffic albeit over a relatively small time span (ten minutes). As a result we’ve enabled a blanket User Agent Blocking rule via Firewall → Tools → User Agent Blocking. We are concerned about site stability issues if the current User Agent ban we have in place for this Agenet were to be removed completely.

What would like to do is institute rate limiting for this specific User Agent. In looking at the settings for creating any new rate limiting rule the rule states "If traffic matching the url (http|https) (.mysite.com/) from the same IP address exceeds (10) requests per (second|minute|hour)…

These requests are coming in from many different IP address - most same country and some overseas. We’ve confirmed that these are, in fact, Facebook owned IP addresses so it appears this is legit crawler traffic.

Further down the Create a Rate Limiting Rule interface is conditional HTTP Response Header(s) check with a default of Cf-Cache-Status Not Equals HIT

What we really need is not a Response Header check from our origin servers but a Request Header check, something similar to ClientRequestUserAgent Equals facebookcatalog/1.0

Is there some workaround/method of rate limiting by specific UserAgent on the request instead of checking the origin response? Something half-way between a UserAgent block and a rate limiting rule?

To add: we don’t want to impact all bot traffic with Super Bot Fight Mode. We also do not currently subscribe to Bot Management. Thanks.

I guess this is something they are working on in the product roadmap.

2 Likes

If you’re an Enterprise customer, please reach out to your account team. This is in a beta now, so they can turn this feature on for you. We have some documentation available here: https://developers.cloudflare.com/firewall/cf-rulesets/custom-rules/rate-limiting

With known good bots that crawl aggressively, it might be worth trying to add a crawl-delay line to your robots.txt file first. Apparently not all crawlers will respect this value, but some do.

2 Likes

Thanks for the heads up about this Beta.

I see from “example 3” use case in the documentation that a check of user-agent in the client header is possible but nothing about checking for a specific user-agent - in other words, the value returned (i.e. “facebookcatalog/1.0”). I’ll need to confirm that is the case.

I would be doing so via dashboard as we are currently not utilizing the API for our current rate limiting rules.

You’re right, the example doesn’t quite fit your use-case. You don’t want to count requests based on unique user-agent values, you already know the exact user-agent. So you could easily do something like this in the new rate limiting:

3 Likes

Perfect. Exactly what I was looking for - that the value for the field is comparable within the Dashboard (and I assume the API).

Thanks much for the help. We’ll reach out to our account manager about Beta participation.

Last question: Is this rate limiting feature currently in Beta going to be included in the ongoing roll out of the new WAF rules announced recently in the blog? Or is this a completely separate feature?

1 Like

This will be a separate feature, although it’s based on the same engine. It won’t be widely released until we reach feature parity with the old rate-limiting. Today, the new rate limiting can’t look at any part of the response (eg rate-limit only after X number of 403 responses), and you can’t set a custom JSON or text response.

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.