This posting is part of a series on Cloudflare’s firewall engine and discusses rules which might make your site just a tad less welcoming to automated robots and crawlers.
Grown up referrers
Already last time we had a brief look at referrers and how they can help us block unsolicited visitors. While we were then mostly focussing on the scheme part of the URL this time we’ll be checking out the hostname. And like last time we will only take absolute URLs into account in this article.
Even though a hostname like
EXAMPLE.COM is actually perfectly valid and will resolve just fine, they are still typically expressed in lowercase and that’s also how browsers handle them (and will convert them to). Nonetheless, some crawlers think otherwise and will send an uppercase referrer. Well, thanks very much, you just gave us a perfect match to block you.
Free & Pro plan
Unfortunately string operations on the Free and Pro plan are limited to exact and sub-string matches, limiting us realistically to a strict blacklist, which means we can only check if the referrer contains a specific uppercase version of, for example, our domain and block based on that.
(http.referer contains "MYDOMAIN.COM")
Of course, you can extend that expression with additional OR clauses to cover other uppercase referrers as well.
(http.referer contains "MYDOMAIN.COM") or (http.referer contains "ANOTHERDOMAIN.COM")
One thing to keep in mind, this rule will also fire if your domain appears in uppercase anywhere else in a referrer URL (e.g.
Business plan and higher
From the Business plan onwards we will have a lot more flexibility as we can make use of the magical world of regular expressions.
Here, for example, we could use the following rule to implement a generic pattern with one single expression and without the need to specify any specific hostnames. We’d simply block requests - again, if a referrer is present - with referrers which do not follow an exclusive lowercase pattern for the scheme and the hostname.
(not http.referer matches "^(?:[a-z-]+://[0-9a-z.:-]+(?:/.*)?)?$")
Challenge or Block?
It naturally is up to you and how strict you want to be, but considering that real user requests are not likely to contain a referrer with an uppercase hostname blocking these requests should be fine.
As always, don’t just copy/paste things and first evaluate if a new rule fits within your site setup and be careful when making such changes as they could break your site if not implemented with care. Also, pay attention to the order of the firewall rules as they are evaluated in order.
Ceterum censeo, Flexible mode is insecure and should be deprecated for the sake of the security of the Internet. Cast your vote at Header indicating encryption status of the origin connection and get more transparency and security on the Internet.