Google console crawl anomaly

For several monthes i am receiving “crawl errors” in Google Console.

I have 30 000 errors now. If i check any of it with “live test” - everthing is ok. But obviously when googlebot crawled - there was an error.

I’ve checked my access.log - google bot never visited sited in all cases.
So then i created a rule in cloudflare firewall. To “allow” all “known bots”. So i recieved logs of all googlebot visits to my site before it was proxied by cloudflare. And google bot is not there.

So google bot is crawling my site. About 1-2 visits per minute. Site is index, everything is good. But not all requests of googlebot are recieved by cloudflare. And in such cases i get “crawl errors” in console.

But how is this related to Cloudflare?
Pls open a support ticket at Google

It’s related if more people have same problem

Actually no unless you say everything was working fine before.

Googlebot is not always reaching Cloudflare network and it’s a fact. Because i checked logs before my server. Packet loss about 5%

When i analyzed logs. I found strange pattern

I foung “good” googlebot ips and “bad”

good
66.249.73.193
66.249.73.198
66.249.73.206
66.249.73.208
66.249.73.210
66.249.73.222
66.249.73.223

bad
66.249.73.199
66.249.73.201
66.249.73.203
66.249.73.212
66.249.73.214
66.249.73.216
66.249.73.220
66.249.73.221

Bad ip’s - never reach my site. But good - alwasy. After request from bad - same second goes request of same page from other ip. And it reaches site and is in access.log

I think there are problems and there should be a lot of people here with same problem

p.s.
I’ve started to recieve crawl errors in google console on january 26

Do you have any country blocking rules in your Cloudflare Backend, or do you see any high number of blocked threads?

If this is the case then pls open a support ticket as the community cant help here.

I’ve created ticked 2 monthes ago, nothing was solved.

Now i’ve created it again, no answer yet.

If there are problems more people have it. So let’s just wait.

Thing is when i found those “bad” google ips my thaught was “ok, those are my crawl anomalies”. No. Thing is there are those bad ips they are not proxied by cloudflare.
And those crawl errors in google console - completely different errors.

Can you show the exact errors that Google is reporting? I haven’t observed any issues with Googlebot reaching my Cloudflare sites recently.

Page fetch Failed: Crawl anomaly

As i said. Google - Cloudflare - My server

I created all logs of google requesting my site through firewall. So i can now cross check google logs and cloudflare logs. Without my server interfering at all. Thing is google sais at this particular time he requested this particular page. And in cloudflare logs there is not google accessing this page not at this time, not at any other time. Network error between google and cloud flare.

Second thing that should be investigated - those bad google ips that never reach site. They reach cloudflare. And then same request goes from different ip and finally reaches site. All those things need to checked from cloudflare personel with access. The only reason why this post is here - there must be more people with same problems.

Pls share the Ticket ID so some moderators here can check it.
Also pls make sure that this is not related to any issue at your side.

Does it have a more specific error message than that? Does it say why it it failed? Even a numeric error code would do.

https://support.cloudflare.com/hc/en-us/requests/1870809

To summarize things is want to highlight:

  1. Starting january i’ve started to recieve “crawl anomaly” errors in google console. I’ve found a way to create a log of all googlebot access not to my site - to cloudflare before it’s proxied to my site. So u go in my firewall rules. There filter rule either “allow” or “bypass” - those both rules that log all “known bots”. There u make additional filter
    User agent contains Googlebot/
    And that’s the access log of googlebot to cloudflare. This i’ve crosschecked with google crawl anomaly log and found that there were no requests from google logged. Either it’s error on google side. Or it was filtered by cloudflare before it reached firewall .

https://*.com/board/showthread.php?t=473947
Last crawled on 20 Apr 2020, 13:48:30

https://*.com/board/showthread.php?t=241398
Last crawled on 20 Apr 2020, 13:35:47

https://*.com/board/showthread.php?t=180051
Last crawled on 20 Apr 2020, 13:35:43

https://*.com/board/showthread.php?t=243516
Last crawled on 20 Apr 2020, 13:28:00

Those are last crawl error logs from google console, gmt +3. but you can simply search querry string with ids from urls. You will find that googlebot never accessed cloudflare.

Why all this? Because there is no possibility my server is doing something wrong. I read not logs of server - but logs of cloudflare before server.

  1. There is just strange second issue i found analyzing cloudflare firewall logs of googlebot accessing my site. I found that therea good google ips - they alwasy reach site. And there are “bad”. It looks like bad is getting error. And same moment alwasy same page is scanned by other google ip. That second is in logs, first - not. So first reaches cloudflare but is not proxied to site.
    This second issue is not in google console error logs.
    good
    66.249.73.193
    66.249.73.198
    66.249.73.206
    66.249.73.208
    66.249.73.210
    66.249.73.222
    66.249.73.223

bad
66.249.73.199
66.249.73.201
66.249.73.203
66.249.73.212
66.249.73.214
66.249.73.216
66.249.73.220
66.249.73.221

It’s very simple to look at it. Ip always change. but if u just look at logs u will see strange lines where 2 requests go at same time. LIke this
23 Apr, 2020 14:18:00
Bypass
United States
66.249.73.180
Firewall rules

23 Apr, 2020 14:18:00
Bypass
United States
66.249.73.184
Firewall rules

Second one checks same page as first one and is in logs, it reached server. But first one failed. I am just highlighting it. Maybe the first one is not supposed to be proxied.

Have you enabled any firewall rules, page rules, or WAF rules?

This topic was automatically closed after 14 days. New replies are no longer allowed.