This posting is part of a series on Cloudflare’s firewall engine and discusses rules which might make your site just a tad less welcoming to automated robots and crawlers.
Although optional, HTTP requests typically contain a
User-Agent field which indicates the software which made the request (e.g. your browser). We could now fill pages over pages with details and advice on how we could interpret and parse this field (and we will actually revisit this topic relatively soon) but for now let’s stick to a rather simple check: whether the field is present at all.
Even if they can and will vary, almost every HTTP client (even purely command line ones or HTTP libraries) will send a valid user-agent, if there is none whatsoever you can pretty safely assume this will be some very basic HTTP implementation and most likely not a regular browser.
(http.user_agent eq "")
This will apply to all requests which either omit the user-agent altogether or just send an empty string.
Pro plan and higher
Here it is even easier as these plans come with “WAF” which already has a built-in rule for this case. Just make sure WAF is enabled at https://dash.cloudflare.com/?to=/:account/:zone/firewall/managed-rules and rule 100001 of “Cloudflare Specials” is set to the desired action
Of course you can still configure aforementioned rule instead.
Challenge or Block?
Naturally it is up to you and how strict you want to be, but considering that real user requests will always come with a valid user-agent, we could relatively safely go for a proper block in this case.
As always, don’t just copy/paste things and first evaluate if a new rule fits within your site setup and be careful when making such changes as they could break your site if not implemented with care. Also, pay attention to the order of the firewall rules as they are evaluated in order.
Ceterum censeo, Flexible mode is insecure and should be deprecated for the sake of the security of the Internet. Cast your vote at Header indicating encryption status of the origin connection and get more transparency and security on the Internet.