This posting is part of a series on Cloudflare’s firewall engine and discusses rules which might make your site just a tad less welcoming to automated robots and crawlers.
We already visited the topic of user agents a couple of weeks ago when we checked for a missing or empty value. This time we take it a step further and boldly assume every browser is to associate with its ancestor Netscape 1.0. And they actually do, all current browsers identify foremost as
Mozilla/5.0 and that’s what we are going to check for today. Please pay particular attention to the “Challenge or Block” section this time.
As usual, on these plans we cannot use regular expressions and can only check for sub-string matches using
contain. Here we simply check whether the user agent header contains (or better, does not contain) the string
Mozilla/5.0 (. The opening parenthesis is because browsers include the whole version information afterwards under parentheses.
(not http.user_agent contains "Mozilla/5.0 (")
One thing to note, for just mentioned sub-string limitation this will also pass requests which contain the string in the middle of the user agent. Although currently not yet available, Cloudflare is planning to offer a
starts_with function in the future, at which point you could use the following expression to be a little bit stricter.
(not starts_with(http.user_agent, "Mozilla/5.0 ("))
As always, on these plans we can use a more elegant approach using regular expressions.
(not http.user_agent matches "^Mozilla/5\.0 \(.+/\d\d")
With this expression we check if the user agent starts with
Mozilla/5.0 (, then has an arbitrary number of characters, and eventually a slash with two digits which is supposed to refer to the version number (the days of reasonable versioning have been long abandoned).
This time it’s a bit tricky. These rules really aim mostly for browsers and you will easily lock out regular bots (although e.g. Google and Bing still identify as
Mozilla/5.0 as well). For this reason you might want to combine the rule with
not cf.client.bot, to at least exclude Cloudflare’s known bots from the check. Furthermore the rule would block general tools like cURL so you should really fine-tune the rule for your own use case.
Depending on how you eventually customised the rule, straight out blocking might not be the best course of action and one of the challenge types might play it safer.
As always, don’t just copy/paste things and first evaluate if a new rule fits within your site setup and be careful when making such changes as they could break your site if not implemented with care. Also, pay attention to the order of the firewall rules as they are evaluated in order.
Did you know? Flexible mode is insecure and should be deprecated for the sake of the security of the Internet.
Cast your vote at Header indicating encryption status of the origin connection and get more transparency and security on the Internet.