Enhancement for crawler/bot matching


#1

Currently cf.client.bot is a boolean flag which indicates whether the request came from a known crawler. While this allows for some generic filtering it would be nice to have more granular control over such requests to block (or allow) for example only specific crawlers.

For this I’d like to suggest cf.client.crawler (different name to keep backwards compatibility with the current flag, but any other name should be fine too) which contains an optional lowercase string specifying the crawler which sent the request (or null/undefined/etc. in case of no crawler).

This would allow for a more flexible configuration of the following type


Firewall Operator
#2

A gentle nudge :slight_smile:


#3

A mighty unpopular idea apparently :smile:

Comments appreciated.


#4

My brain overloaded when you wanted to allow for “more flexible.”

Why not combine the existing Known Crawler rule with a User Agent string rule?


#5

Thats a fair point and I didnt consider that earlier.
Though I’d still believe

cf.client.crawler in {"google" "bing"}

is easier, more straightforward, and less error prone than

cf.client.bot and (http.user_agent contains "Google" or http.user_agent contains "Bing")

The latter also requires one to either know all applicable user agents or find a common pattern and use contains.