Cloudflare Firewall Blocking Legitimate Google/Bing bots

Approximately 3 days ago, Cloudflare began blocking Google Shopping fetch of my product feed xml file from my website that is behind the Cloudflare CDN. It was just noticed today in reviewing logs at Google Merchant Center… which was reporting an “authorization error” when the fetch doesn’t require an authorization. Similar circumstances were apparent for Bing. Reviewing the firewall logs at CF, I found the legitimate requests were being blocked as “Fake”… Cloudflare Specials rules 100201 and 100202. The work around is to disable these two rules, which then allowed the bots to fetch normally.

This appears to be a regression of Cloudflare Managed Special rules are blocking Googlebot

This has been documented and reported to CF support on a ticket, but as of this post (about 90 minutes since ticket opened) there has been no reply.

FYI… I’m posting here because it’s possible this is widespread and may be impacting more than just my shopping feed as I’m assuming it would also be blocking ordinary search engine scans as well. As of now there is no mention of the issue in the CF status/incident pages, but could be disastrous for SEO if affecting other websites.

An indication of the problem as viewed in the CF Firewall Events log…
Legitimate Googlebot request coming from an ASN identified by CF as Google, yet triggered the 100201 Fake Google Bot rule…

The requests will come from Google’s network but are most likely not Google’s crawler. That is probably just someone running their software on Google’s infrastructure and posing as “Google”. Google’s crawlers come from a 69.x.x.x network, AFAIK.

Furthermore “Googlebot” is the never the sole part in the user agent. In short, Cloudflare would seem to be right to block in these cases.

1 Like

Yes, this behavior is intended.

Those requests are indeed coming from Fake Google Bots. The correct User-Agents that Google will use are described at: https://support.google.com/webmasters/answer/1061943. Validation can be done with a reverse lookup described here: https://support.google.com/webmasters/answer/182072.

For example, the 1st IP in your logs 74.125.191.161, doesn’t have the correct hostname (or any in this case):

$ host 74.125.191.161
Host 161.191.125.74.in-addr.arpa. not found: 3(NXDOMAIN)
4 Likes