Cloudflare Managed Special rules are blocking Googlebot

Fortunately the Cloudflare team was quick (as it always is) and started working on this incident shortly after @cali_b had reported it.

Current status

Resolved – WAF blocking some legitimate search engine crawls

But even after it has been fixed some users will choose to keep these rules disabled due to the possibility of having their websites’ visibility impaired on Google and other SERPs, which could mean a huge loss in some cases.

So I decided to post this comment in an attempt to enrich the discussion.


Workaround

As @cs-cf said, the best thing to do is to disable the rules individually so you don’t lose all the other Cloudflare Specials benefits.

Step-by-step

  1. Click the Firewall tab
  2. Click the Managed Rules sub-tab
  3. Scroll to Cloudflare Managed Ruleset section
  4. Click the Advanced link above the Help
  5. Change from Description to ID in the modal
  6. Search for 100035 and check carefully what to disable
  7. Change the Mode of the chosen rules to Disable

Rules matching the search

  • 100035 - Fake google bot, based on partial useragent match and ASN
  • 100035B - Prevent fake bingbots from crawling
  • 100035C - Fake google bot, based on exact useragent match and DNS lookup
  • 100035D - Fake google bot, based on partial useragent match and DNS lookup
  • 100035U - Prevent fake BaiduBots from crawling
  • 100035Y - Prevent fake yandexbot from crawling

A seventh rule related to fake bots was deployed during the incident:

  • 100035_BETA - Fake google bot, based on partial useragent match and ASN

According to its description, it may be the substitute version of 100035. The rule was made available with its Default mode set to Simulate and was not triggered on any of the accounts I manage.

First time changing specific rules

After disabling the chosen rules and closing the modal, a blue icon will be permanently displayed next to the Cloudflare Specials group, informing the “x rules modified” message when “hovered” - where x is the number of rules that have been modified.

I didn’t figure out how to be informed on which rules were changed after making the changes, so keep in mind that you’ll need to know which rules have been disabled in order to easily re-enable them in the future - searching for their IDs instead of looking at all the rules.

Saving the current page’s permalink may work:

  • https://community.cloudflare.com/t/Cloudflare-managed-special-rules-are-blocking-googlebot/82911/14

Observations

I can confirm the same behavior for the rules 100035, 100035B and 100035Y.

100035

Fake google bot, based on partial useragent match and ASN

IP addresses

  • 66.249.66.215
  • 66.249.66.217
  • 66.249.66.219

All IPs belong to AS15169 (Google LLC).

UA strings

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

All UAs are listed in the Google crawlers documentation.

References

100035B

Prevent fake bingbots from crawling

IP addresses

  • 157.55.39.188
  • 157.55.39.189
  • 157.55.39.191
  • 157.55.39.238
  • 207.46.13.50
  • 207.46.13.216

All IPs belong to AS8075 (Microsoft Corporation).

UA strings

  • Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

The UA is listed in the Bing crawlers documentation.

References

100035Y

Prevent fake yandexbot from crawling

IP addresses

  • 5.255.250.15
  • 178.154.246.137

All IPs belong to AS13238 (Yandex LLC).

UA strings

  • Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

The UA is listed in the Yandex crawlers documentation.

References


Questions

Please clarify the incident itself, how it was fixed and what was done to not happen again.

Why have the requests been blocked if they appear to be authentic?

All crawlers were identified with compatible UA strings and IP addresses, corresponding to their official documentation and the ASN of the companies to which they belong.

Have fixes been implemented for all rules related to fake bots?

The 100035_BETA (temporarily made available during the incident) only covered the Googlebot. No rule was created for the other affected crawlers.

What are the chances of it happening again?

As we know nothing about what caused the incident and how it was resolved, it is not possible to understand the possibilities of occurring at another time.

Can Cloudflare users feel safe?

There are businesses heavily dependent on search-driven access. If their websites were penalized or de-indexed (partially or totally), the viability of these companies/products/services could be drastically impacted.


I’ve always been (more than) satisfied with Cloudflare, but this problem made me extremely worried.

Please help us understand!

Thanks in advance.

8 Likes