My firewall rule sometimes blocks DuckDuckBot, but I’m unable to understand why.
The rule expression is:
http.request.uri.path contains ".jpg" and not (http.referer contains "example.com" or cf.client.bot)
With Block
as action.
(My website name is replaced with example.com
.)
The goal of this rule is to disable hotlinking of images except for example.com
, by checking if the HTTP referer
field contains my domain. Also, the rule should allow CF verified bots to access the images without the need to provide the referer
field.
The CF log shows the following entry is blocked:
{
"action": "block",
"clientASNDescription": "MICROSOFT-CORP-MSN-AS-BLOCK",
"clientAsn": "8075",
"clientCountryName": "US",
"clientIP": "40.64.105.247",
"clientRequestHTTPHost": "image.example.com",
"clientRequestHTTPMethodName": "GET",
"clientRequestHTTPProtocol": "HTTP/1.1",
"clientRequestPath": "product.jpg",
"clientRequestQuery": "",
"datetime": "2022-05-07T16:30:36Z",
"rayName": *************,
"ruleId": *************,
"rulesetId": "",
"source": "firewallrules",
"userAgent": "DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)",
"matchIndex": 0,
"metadata": [
{
"key": "filter",
"value": **********
},
{
"key": "type",
"value": "customer"
}
],
"sampleInterval": 1
}
But in my understanding, it shouldn’t be blocked, because DuckDuckBot should be a verified bot, according to https://radar.cloudflare.com/verified-bots. So not (cf.client.bot)
should result in false
, so the rule expression should be false, and thus should not match, but according to the log, it does match, which means that the DuckDuckBot
is not seen as a verified bot.
Also, my web server log does show that some HTTP requests by DuckDuckBot
do reach the server, but under a different IP-address, such as 52.146.59.154, 52.143.241.111, 20.72.242.93, 51.104.146.235, 40.89.243.175. These are all Microsoft IP-addresses.
DuckDuckGo officially lists the following IP-address for DuckDuckBot:
20.191.45.212
40.88.21.235
40.76.173.151
40.76.163.7
20.185.79.47
52.142.26.175
20.185.79.15
52.142.24.149
40.76.162.208
40.76.163.23
40.76.162.191
40.76.162.247
These are again Microsoft addresses. DuckDuckGo probably uses Bingbot, but it seems that DuckDuckGo failed to specify all their IP-addresses on that DuckDuckGo bot page.
What could be the reason that my CF rule blocked the DuckDuckGo request?
Maybe the verified bot mechanism of CF doesn’t take all IP-addresses of DuckDuckBot into account?