Auto-closing topics too quickly really is an issue here. There are dozens of auto-closed issues about this topic, while it was never resolved. I just link one here, and create this as follow-up, in the hope it is not again auto-closed: Cloudflare blocking bingbot crawl
@tye730 @fritex pinging you here, as you are probably interested.
Today I switched to the new managed WAF rules, and watched the event log. Cloudflare’s own ruleset triggered a block of the following request:
{
"action": "block",
"clientASNDescription": "MICROSOFT-CORP-MSN-AS-BLOCK",
"clientAsn": "8075",
"clientCountryName": "US",
"clientIP": "40.77.202.147",
"clientRequestHTTPHost": "dietpi.com",
"clientRequestHTTPMethodName": "POST",
"clientRequestHTTPProtocol": "HTTP/2",
"clientRequestPath": "/matomo/matomo.php",
"clientRequestQuery": "?action_name=Profile%20-%20helio58%20-%20DietPi%20Community%20Forum&idsite=1&rec=1&r=936160&h=9&m=57&s=18&url=https%3A%2F%2Fdietpi.com%2Fforum%2Fu%2Fhelio58&_id=23275a695d662683&_idn=1&send_image=0&_refts=0&pv_id=ufOegz&pf_net=0&pf_srv=18&pf_tfr=0&pf_dm1=63&uadata=%7B%7D&cookie=1&res=320x568",
"datetime": "2024-02-24T17:57:18Z",
"rayName": "85a99713ba28307c",
"ruleId": "ae20608d93b94e97988db1bbc12cf9c8",
"rulesetId": "efb7b8c949ac4650a09736fc376e9aee",
"source": "firewallManaged",
"userAgent": "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",
"matchIndex": 0,
"metadata": [
{
"key": "ruleset_version",
"value": "184"
},
{
"key": "version",
"value": "184"
},
{
"key": "type",
"value": "customer"
}
],
"sampleInterval": 1
}
This rule is named Anomaly:Header:User-Agent - Fake Bing or MSN Bot
. However, looking at the user agent, it is the correct BingBot. Probably Cloudflare expects an old user agent, which changed 2 years ago: Announcing user-agent change for Bing crawler bingbot | Bing Webmaster Blog
Comparing the user agents, expected in 1st line and the one which triggered the WAF rule below:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
A perfect match, I would say, hence the Cloudflare managed rule is wrong.
Or do I understand the rule wrong, and it triggers when an IP or something uses the BingBot user agent, while it cannot be the BingBot, based on IP range or other request parameters?