FAQ: URL Normalization

Yesterday (8th April) we launched a new feature called ‘URL Normalization’. This new feature can be found within the new ‘Rules’ setting section in the UI.

:movie_camera: Overview of URL Normalization
The intention of normalization is to prevent malicious actors manipulating HTTP requests to bypass security settings by normalizing all requests to a standard format which then allows you as the user to predictability write rule filters.

The easiest way to explain normalization is the following; you have a firewall rule of (http.request.uri contains “/login”). Without normalization, I can send in an HTTP request with the ‘l’ percent-encoded, i.e. curl --path-as-is https://www.example.com/%6cogin. This request will be allowed through your firewall and will make it to your origin server/application:

Now, if you enable URL Normalization ‘incoming’ only, then Cloudflare will see the request with the URI Path of %6cogin and normalize it to /login before the request is seen by any other Cloudflare products. This means that Firewall Rules now see’s ‘/login’ and blocks it correctly.

Valid requests such as https://www.example.com/legit%2Drequest will be normalized to
https://www.example.com/legit-request, analysed by your rules, and if unblocked then it will be sent through to origin - but with the origin URI path of /legit%2Drequest.

This is done to avoid breaking any origin-side applications such as API’s that rely on encoded requests:

If you decided to enable normalization ‘to origin’ also, then requests to https://www.example.com/legit-request will be normalized for processing by all Cloudflare products but will also be sent to the origin server with the URI Path /legit-request also.
Screenshot 2021-04-09 at 13.50.33

:desktop_computer: Two Options
With URL Normalization there are two options you can enable/disable:

The first option, ‘Normalize incoming URLs’, will take all HTTP requests received by your Cloudflare zone and normalize them using our new normalization managed ruleset.
This ensures that products including Firewall Rules, Page Rules, Workers, Transform Rules et al will all receive a consistent input.

This is crucial to ensure that you, as the administrator, have even greater confidence that a rule to block requests to ‘/login’ will work as expected - whether the URI Path is /login, /6cogin, //login, //%6cogin, and so forth. All variations will be normalized to /login, and then processed by your rule.

The second option, ‘Normalize URLs to origin’, can only be enabled in conjunction with ‘Normalize incoming URLs’. This option, when enabled, will also send the normalized HTTP request to the origin server.
In the example previously, with ‘Normalize URLs to origin: off’, the Firewall Rules will see ‘/login’ but the origin server will still see /login, /6cogin, //login, //%6cogin, etc.
With ‘Normalize URLs to origin: on’, the origin server will also see the normalized value of /login.

:toolbox: Normalization Techniques
Regarding examples of normalization, the simplest ones to explain are ‘…/’ which is a common path traversal attack. With Normalization enabled this will be normalised to ‘/’ instead, meaning any firewall rule looking for ‘…/’ will no longer match (as it wont ever see ‘…/’ again), and can be safely deleted.

Common examples of “ Path-resolution normalization ” are:
\ becomes /
* becomes */
// becomes /
Leading ./ or …/ becomes /
Trailing /. becomes /
/./ becomes /
/../ becomes /
Resolve path

Another example would be a Firewall Rule looking for ‘"…%2F"’ within the URI Path. This is called percent-encoding. With normalization enabled this will be normalized to ‘/’, again meaning the firewall rule wont trigger.

An overview of how we implement Percent-encoding normalization are:

  1. Do not encode or decode “reserved characters”.
  2. For any other character, percent encode (ie, if we have a literal byte value of 0xb9, represent that as %B9).
  3. Convert any percent encoded forms to upper case.
  4. Spaces (%20) remain unchanged.

:email: I received an email saying this hadn’t been enabled for me?
As part of the roll-out, we decided to opt-out all zones that contained patterns or characters within Firewall Rules that would be affected if normalization were enabled - to prevent any operational impact.

If you received this email there are two approaches you can take, per the KB:

  1. Edit your Firewall Rules to change http.request.uri to raw.http.request.uri. This means that the firewall rule will still see the raw, un-normalized values even when URL Normalization is enabled.
  2. Update your Firewall Rules, i.e. you dont need (http.request.uri contains “…%2F”) and (http.request.uri contains “…/”) anymore as they will be both normalized to ‘/’.

Commonly seen Firewall Rules that would conflict with URL Normalization are:
(http.request.full_uri contains "..%2F")
(http.request.uri contains "%3F")
(http.request.full_uri contains "../../../")
(http.request.full_uri contains "c%3A%5C")
(http.request.full_uri contains "..%2F..%2F")

These will all be decoded / normalized now to their true ‘form’, e.g. "c%3A%5C" will become C:/.

Link to the KB article: Required Firewall Rule changes to enable URL Normalization – Cloudflare Help Center

:email: I received an email saying this hadn’t been enabled for me, but it appears to be enabled in the UI?
This was an unfortunate UI bug that was reading the incorrect value in the API. We have fixed this now. Rest assured, the normalization functionality was never enabled on your zone, it was simply a cosmetic bug that reporting an incorrect normalization state.

:question: Is there any reason not to enable Normalize URLs to Origin?
The vast majority of setups are fine to normalize to origin; its only if you have applications or origin-side configurations that expect percent encoding (like certain legacy applications and some API gateways).
Thats why we took the decision to not do this globally for everyone, and instead have it only on for ‘what Cloudflare products see’ so that Firewall Rules, Page Rules, Workers, Transform Rules, et al all see a consistent input - regardless of the tricks malicious actors may use such as path traversal, encoding, etc.
If your origin doesnt need to receive encoded traffic then you’re fine to turn this on in both directions.