Cloudflare Managed Special rules are blocking Googlebot

#1

I got in the office today with half of our Google product listing ads dissaproved. Upon investigating the issue, I found that Cloudflare is blocking Googlebot, under rule 100035 (and a few other ones).

Anyone else experiencing this?

3 Likes
#2

Same - Google, Bing, and Yandex were all being blocked by the 100035 CF Special rule and subrules.

1 Like
#3

same with my account. Legitimate google, bing, yandex request geting blocked.

1 Like
#4

I opened a support ticket and they said they are aware of the issue and currently working for a solution. Turned off special rules for now…

#5

For anyone else experiencing this, go to Firewall settings > Managed Rules, and turn off Cloudflare Specials to fix this temporarily until CF has a solution.

#6

It is going on for many hours now. I checked logs. It might take weeks to recover pages in google to recover.

#7

You can search (under advanced) for the specific rule and just disable it. Cloudflare Specials has a lot of powerful WAF rules in it, so disabling it entirely may be overkill.

2 Likes
#8

cschariff,

Is any other rule other than 100035 series also affecting Google not to crawl pages?

1 Like
#9

The reason why I disabled all special rules (temporarily) is that for some reason disabling specific ones did not seem to work for me. There were also at least 6 or 7 that I kept seeing, related to other bots being blocked.

#10

Cali_b,

It might not be good trusting cloudflare rules right now. Banning google have very longterm side effect.

1 Like
#12

Seeing the exact same problem, disabling rule 100035 and 100035B on page 9 of Cloudflare Specials worked for me.

#13

Looks like a known issue and a fix has been implemented.

https://www.cloudflarestatus.com/incidents/pld51xj2hlpy

1 Like
#14

Fortunately the Cloudflare team was quick (as it always is) and started working on this incident shortly after @cali_b had reported it.

Current status

Resolved – WAF blocking some legitimate search engine crawls

But even after it has been fixed some users will choose to keep these rules disabled due to the possibility of having their websites’ visibility impaired on Google and other SERPs, which could mean a huge loss in some cases.

So I decided to post this comment in an attempt to enrich the discussion.


Workaround

As @cscharff said, the best thing to do is to disable the rules individually so you don’t lose all the other Cloudflare Specials benefits.

Step-by-step

  1. Click the Firewall tab
  2. Click the Managed Rules sub-tab
  3. Scroll to Cloudflare Managed Ruleset section
  4. Click the Advanced link above the Help
  5. Change from Description to ID in the modal
  6. Search for 100035 and check carefully what to disable
  7. Change the Mode of the chosen rules to Disable

Rules matching the search

  • 100035 - Fake google bot, based on partial useragent match and ASN
  • 100035B - Prevent fake bingbots from crawling
  • 100035C - Fake google bot, based on exact useragent match and DNS lookup
  • 100035D - Fake google bot, based on partial useragent match and DNS lookup
  • 100035U - Prevent fake BaiduBots from crawling
  • 100035Y - Prevent fake yandexbot from crawling

A seventh rule related to fake bots was deployed during the incident:

  • 100035_BETA - Fake google bot, based on partial useragent match and ASN

According to its description, it may be the substitute version of 100035. The rule was made available with its Default mode set to Simulate and was not triggered on any of the accounts I manage.

First time changing specific rules

After disabling the chosen rules and closing the modal, a blue icon will be permanently displayed next to the Cloudflare Specials group, informing the “x rules modified” message when “hovered” - where x is the number of rules that have been modified.

I didn’t figure out how to be informed on which rules were changed after making the changes, so keep in mind that you’ll need to know which rules have been disabled in order to easily re-enable them in the future - searching for their IDs instead of looking at all the rules.

Saving the current page’s permalink may work:

  • https://community.cloudflare.com/t/cloudflare-managed-special-rules-are-blocking-googlebot/82911/14

Observations

I can confirm the same behavior for the rules 100035, 100035B and 100035Y.

100035

Fake google bot, based on partial useragent match and ASN

IP addresses

  • 66.249.66.215
  • 66.249.66.217
  • 66.249.66.219

All IPs belong to AS15169 (Google LLC).

UA strings

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

All UAs are listed in the Google crawlers documentation.

References

100035B

Prevent fake bingbots from crawling

IP addresses

  • 157.55.39.188
  • 157.55.39.189
  • 157.55.39.191
  • 157.55.39.238
  • 207.46.13.50
  • 207.46.13.216

All IPs belong to AS8075 (Microsoft Corporation).

UA strings

  • Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

The UA is listed in the Bing crawlers documentation.

References

100035Y

Prevent fake yandexbot from crawling

IP addresses

  • 5.255.250.15
  • 178.154.246.137

All IPs belong to AS13238 (Yandex LLC).

UA strings

  • Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

The UA is listed in the Yandex crawlers documentation.

References


Questions

Please clarify the incident itself, how it was fixed and what was done to not happen again.

Why have the requests been blocked if they appear to be authentic?

All crawlers were identified with compatible UA strings and IP addresses, corresponding to their official documentation and the ASN of the companies to which they belong.

Have fixes been implemented for all rules related to fake bots?

The 100035_BETA (temporarily made available during the incident) only covered the Googlebot. No rule was created for the other affected crawlers.

What are the chances of it happening again?

As we know nothing about what caused the incident and how it was resolved, it is not possible to understand the possibilities of occurring at another time.

Can Cloudflare users feel safe?

There are businesses heavily dependent on search-driven access. If their websites were penalized or de-indexed (partially or totally), the viability of these companies/products/services could be drastically impacted.


I’ve always been (more than) satisfied with Cloudflare, but this problem made me extremely worried.

Please help us understand!

Thanks in advance.

6 Likes
#15

Couldn’t agree more!

Our lead generation and e-commerce operations were severely affected by this issue. And although I disabled the Special rules within one hour from receiving the Google alert, we are still suffering the consequences of around 100 disapproved product listing ads. As of right now (24 hours later) we still have 20 disapproved product listing ads that continue to affect our marketing efforts.

Aside from short term paid media implications, there is also a concern regarding SEO efforts as blocking Googlebot can have serious long term implications on a site’s indexing performance.

A quick Google search tells me this has been an ongoing issue, so we really need some clarifications from the Cloudflare team and assurance that a long term solution is in place!

4 Likes
#16

Hi cscharff,

Could you please take a look at this comment?

Thank you!

#17

My recommendation would be to contact support (or if an enterprise customer you Customer Success Manager) and ask if they can provide an incident summary/ answer any questions. I’m not part of that team and while I work for Cloudflare and know some things, this particular issue was not one I was involved with so I can’t provide meaningful context unfortunately.

1 Like
#18

The request

Following cscharff’s recommendation, I opened a support ticket (#1687378) referencing the incident and the questions:

As you can see in the image above, I asked the support team to respond directly to this topic, but if that is not possible I will keep the discussion up to date with new comments as soon as I get updated.


Your help

Considering the seriousness of the consequences that a problem like this can generate, I would like to ask the Community members to participate, contributing to the enrichment of the discussion, so that other users and the Cloudflare team can be aware of the relevance of this topic.

Please consider sharing

  • Your knowledge of the consequences of blocking search crawlers;
  • Your experience, current or previous, related to this kind of problem;
  • Your thoughts on how Cloudflare should proceed after the incident;
  • Your fears about having your business kicked out of the SERPs;
  • And so on…

Special thanks to the members who have been participating in this discussion so far: @cali_b @noc9 @user3011 @bluespire @cscharff @floripare @domjh @cloonan @boynet2 @nomesbiblicos

2 Likes
#19

First response

Unfortunately, none of the questions has been answered yet:

To keep you in the loop, at this time I’ve raised your ticket with our internal team who are working on providing some clarity on the incident. Please be assured that we will update you as soon as we have a new development that we can report back to you.

We appreciate you reaching out to help us better our process of making these incidents more clear to our customers.

— Mike Lee

I’ll be back to update you as soon as I have news.

1 Like
#20

I had a similar issue I reported on last month: Firewall rules blocking legitimate Google bot. I ended up whitelisting the Google bot user agent (Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and setting the 100035 to simulate.

Not clear what the real downside is of fake agents getting through, but just can’t take the chance that Google or one of the various Google platform tools will be blocked.

#21

The issue was not similar. That request was from a fake Google block and blocked accordingly.

You can certainly disable those rulesets mentioned in this thread if you don’t mind fake bots crawling.