We’re seeing real Google bot requests being blocked by firewall rules 100035C and 100035D as fake Google bots. We know the requests are real based on the IP address (and a check on http://whois.domaintools.com/). Short term, we’ll fix this on our end by shutting off those rules. However, we’ll lose the ability to block fake Google bot attacks, and other Cloudflare users may not be aware of this issue, so would be best for Cloudflare to fix this on their end.
Should have also included some of the blocked IP addresses:
They are not Google Bots, they are machines on Google Cloud rented out to customers who pretend to be Google Bots.
You should block them.
How have you determined those are real googlebot requests?
dig -x 18.104.22.168 +short
We’re trying to troubleshoot an issue with Google Pagespeed Insights where it throws an error every time analyzing our site (useless error code, and Google isn’t being responsive). The timing of the blocks for the IPs I listed correspond with our Pagespeed attempts, so pretty sure they are “real” Google. However, I probably misspoke by calling it Google bot, as it’s not the search bot, but the Pagespeed Insights bot.
One big problem with Pagespeed Insights is that their test may come from any Google location. That in and by itself generates lots of false reports, as far as page load speed go, as each time you run the test it may hit a different Cloudflare datacenter, where your static files may not yet be cached.
To run the tests, what you can do is go to Cloudflare’s Dashboard > Firewall > Tools > User Agent Blocking, then create a rule to Whitelist their user agent. After you run the page speed tests, make sure you un-whitelist the user agent, as you don’t want someone spoofing their UA to get a free pass.
For more info on this, here is the user agent that was being blocked: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html). The agent was coming from Google servers based on reverse IP lookups. According to Google, this user agent is Googlebot (desktop).
We’re actually seeing a lot of traffic from this agent, so our earlier assessment that is was just for Pagespeed Insights appears to be wrong.
We’ve created a firewall rule to whitelist this user agent, but concerned that Cloudflare is blocking by default what appears to be a valid Google agent, and us whitelisting the agent as a firewall rule will allow spoofed versions of this agent to get through. Not clear how to allow one but prevent the other?