This posting is part of a series on Cloudflare’s firewall engine and discusses rules which might make your site just a tad less welcoming to automated robots and crawlers.
May I have your scheme?
Every HTTP request can optionally contain a header field which indicates the URL from where the request originated, the
Referer field. Fun fact, the typo goes back to '96 and the RFC 1945. One important thing, the field is completely optional and under complete control of the client, so you can’t and shouldn’t rely on it for any security or business logic.
Now, technically the referrer value can be either an absolute or a relative URL. In practice, however, browsers implemented it in a fashion which would make Louis XIV proud. So for the sake of the argument, we will only take absolute URLs into account in this article.
Every URL is comprised of different elements, one of which is its scheme (the famous
http://) and which is mandatory (technically the scheme is the value before the colon but we ignore this here for now). Still, some crawlers send a referrer for some reason but then omit the scheme and that’s one way how we can block them.
Free & Pro plan
Unfortunately string operations on the Free and Pro plan are limited to exact and sub-string matches, which is why we can only use
If we configure the following rule, we will cover all requests which - if a referrer is present - do not contain a valid scheme definition for
(http.referer ne "" and not http.referer contains "http://" and not http.referer contains "https://")
Keep in mind, these two schemes are just the most common examples and there are plenty of others (e.g.
android-app:// for links opened from Android applications) and you should add them to your expression if they are applicable to your use case.
Even though browsers never send an upper-case scheme, it technically would be valid and if you want to support that too you could make use of the
(http.referer ne "" and not lower(http.referer) contains "http://" and not lower(http.referer) contains "https://")
Business plan and higher
From the Business plan onwards we will have a lot more flexibility as we can make use of the magical world of regular expressions.
Here, for example, we could use the following rule to achieve the same with one single expression as what we just did on the Free plan, plus we support all valid scheme strings.
(not http.referer matches "^(?:[a-zA-Z][a-zA-Z0-9+.-]*://.+)?$")
Challenge or Block?
As always this is naturally up to you and how strict you want to be, but considering that real user requests should never contain a URL with a broken scheme, we could relatively safely go for a proper block in this case.
As always, don’t just copy/paste things and first evaluate if a new rule fits within your site setup and be careful when making such changes as they could break your site if not implemented with care. Also, pay attention to the order of the firewall rules as they are evaluated in order.
Ceterum censeo, Flexible mode is insecure and should be deprecated for the sake of the security of the Internet.