Cloudflare returns 403 Forbidden to Googlebot

What is the name of the domain?

gidapp.com

What is the error number?

403

What is the error message?

403 - Forbidden

What is the issue you’re encountering

Google Search Console will report some pages/links Blocked due to access forbidden (403)

What steps have you taken to resolve the issue?

Occasionally, Google Search Console will report some pages/links Blocked due to access forbidden (403).

In the past, they were few and far in between so I tolerated them, but recently I am getting these reports with sometimes up to 59 pages “blocked” in a single report.

So, I finally decided to investigate it thoroughly today before posting my issue here.

On Cloudflare, for my domain, I visited this page:

Security > Analytics

Then picked the following filters:

  • Edge status code equals 403 Forbidden
  • Source ASN equals 15169 (Google)
  • Country equals United States
  • Mitigated equals No

I found 13 records.

Please see attached part screenshot of 1 record. (Source IP address is definitely Googlebot)

All 13 records say:

  • Mitigation: Not mitigated
  • Edge status code: 403 - Forbidden

I don’t know how to figure this out. Please help.

What are the steps to reproduce the issue?

See above

Screenshot of the error

Hi @jdesilva,

Please verify if your Super Bot Fight mode is set up to block Verified bots or if Block AI Bots is enabled. This would block known good bots like Googlebot.

2 Likes

Thank you, @bujangnim .

I have (just now) disabled Block AI Bots.

Meanwhile Verified bots setting is still set to Allowed.

I will continue to monitor the situation and reply here if the issue persists.

1 Like

12 hours later, there appears to be no changes. The report is now showing 73 records of Googlebot being blocked, with nearly 40 within the last 12 hours, or since I made the changes.

Check your Cloudflare Security Events to see exactly what’s triggering this 403, so you can aim your mitigation directly at that.

1 Like

As per the attached screenshot in the original post, it shows:

Mitigation = Not Mitigated

What does that mean?

It seems like this issue is just getting worse.

In the last 24 hours they are 120 instances of Googlebot being blocked with status 403 Forbidden on my web site.

I hope someone can look into this seriously.

As @GeorgeAppiah recommended, could you check your security events for the actual reason why the requests were blocked? You can find the event log here: https://dash.cloudflare.com/?to=/:account/:zone/security/events

The screenshot from the OP does not show the events log.

2 Likes

It is as if you are not reading my posts at all!

It clearly states right there in the report: Mitigation = Not Mitigated.

That means - at least what I gather from the tooltip on the web page - “did not match a rule or… skipped/allowed”

Screenshot from 2024-10-30 20-08-55

It rather looks like you are not reading our posts. Your screenshot is - again - from the Analytics page, NOT from the Events log.

Did you check the Events? The events page would show why a 403 would be issued to a request.

Of course I have checked the Events page.

Why would I find it the details of this block in the Events page when it was not mitigated in the first place?

This is a page that was served by Cloudflare, and NOT mitigated, and yet the edge returned 403 Forbidden.

Because all Cloudflare features that issue a 403 should show up in the events log. I don’t think the same is true for the Mitigation field in the Analytics tab.

Are you maybe using any Workers? Or any Cloudflare optimization settings like Automatic Signed Exchanges, AMP Real URL, Early Hints, Smart Hints etc?

2 Likes

Because all Cloudflare features that issue a 403 should show up in the events log.

These are Googlebot IPs. So, they are verifiedbots. In my Custom Rules, it is the no. 1 rule to detect verifiedbots and action is set to Skip (WAF features or disables specific Cloudflare security products for matching requests.)

So in Security > Events page, these IPs just appear under the Skip action Events summary category and NOT anywhere else.

In the last 24 hours there were 145 instances of Googlebot being blocked with status 403 Forbidden on my web site.

If you’ve got the “Block AI bots” feature enabled at Cloudflare, then if the User-agent contains GoogleOther despite it is on the Verified Bot List, it would be detected and blocked or challenged for a reason.

Could be fake Googlebots as well from Google ASNs such as Google Cloud Platform.

Check Security tab → Events to determine which path are they trying to visit therefrom determine which Cloudflare service was triggered and did blocked the reqeust coming from them despite you’ve configured your Custom Rule to SKIP.

If they appear as SKIP, great, meaning your DNS records are proxied :orange: and your Custom Rule is working as expected.

Despite you SKIPped Verified bot, the Analytics still shows 403 for fake Googlebots, or the ones coming from Google ASN and Google IPs I am afraid.

Mind you share a screenshot or fields you’ve used for you filter? :thinking:

These are some bots, verified, others not from coming from Google (their ASN and their IP addresses):

403s for Google ASN and Google IP addresses:
slika

So from above, I have quite a lot of Googlebots which I really don’t like and better they’re blocked or challenged from scarping and crawling my Website, generating unwanted traffic.

Regarding Google Search Console, I’d re-check for my robots.txt file, possible sitemap.xml file which I am might be missing, otherwise robots meta tag at my Website if it’s blocking something, despite the Cloudflare Security options available to me which I would tune-up to allow only real Googlebot crawl my Website (no AI Googlebot).

I already disabled it but did not make a difference at all; see: Cloudflare returns 403 Forbidden to Googlebot - #3 by jdesilva

Under ‘IP Access Rules’ for my domain I have ASN AS396982 Google LLC - which I believe is GCP - set to action: Managed Challenge.

Anyway, all the IP addresses appearing here in my issue/report are all verified Google bot IPs. I have even verified them manually, just to be sure, by running reverse and then forward DNS lookups on them.

Yes, they all appear as “SKIP”. Everyday Googlebot visits my site to the tune of 10s of thousands of requests successfully, so this is just affecting a fraction typically but certainly a “slowly growing issue” that is affecting my domain that I wanted to bring up. See the Google Search Console report for one site, for example:

Notice how it is becoming a growing problem?

I don’t mind at all. Here you go:
Screenshot from 2024-11-03 08-50-24

Using the same filters with our internal logs we can see it’s getting blocked by one of your custom rules.:cowboy_hat_face:

Go to the /security/events page to track down which rule is matching so you can adjust your configuration as needed. :orange_heart:

2 Likes

First of all, thank you for looking into this issue. I appreciate it very much.

Maybe I am stupid, because I cannot find any useful information with the Security/Events page for my issue.

Let me explain again what I am doing and how I’m getting my information.

On the Security/Analytics page, with the filters you can see in the screenshot, I find all the Googlebot IPs that are blocked in the last 24 hours.

If I take just the top 2 IP addresses from that report and use them to filter the report from the Security/Events page, I get this:

What should I be doing differently to find out the reason why requests from IP addresses: 66.249.73.233 and 66.249.74.70 were blocked?

Security/Analytics does not equal Security/Events, which is the problem.

:point_down:

Note: “IP” has changed name to “Source IP”, since the screenshots were taken.

:point_up_2:

That way is what both @GeorgeAppiah mentioned, and what @Laudian contributed further to above, with the Magic Link that takes you directly to the correct location.

The above (and following) Magic Link will take you directly to the correct place, where you can do a such search:

https://dash.cloudflare.com/?to=/:account/:zone/security/events

:point_down:

:point_up_2:

This Google Cloud instance was blocked by my own WAF rule.

The WAF rule named “Block empty User-Agent” was the one that triggered the blocking.

1 Like