For several monthes i am receiving “crawl errors” in Google Console.
I have 30 000 errors now. If i check any of it with “live test” - everthing is ok. But obviously when googlebot crawled - there was an error.
I’ve checked my access.log - google bot never visited sited in all cases.
So then i created a rule in cloudflare firewall. To “allow” all “known bots”. So i recieved logs of all googlebot visits to my site before it was proxied by cloudflare. And google bot is not there.
So google bot is crawling my site. About 1-2 visits per minute. Site is index, everything is good. But not all requests of googlebot are recieved by cloudflare. And in such cases i get “crawl errors” in console.
Googlebot is not always reaching Cloudflare network and it’s a fact. Because i checked logs before my server. Packet loss about 5%
When i analyzed logs. I found strange pattern
I foung “good” googlebot ips and “bad”
good
66.249.73.193
66.249.73.198
66.249.73.206
66.249.73.208
66.249.73.210
66.249.73.222
66.249.73.223
bad
66.249.73.199
66.249.73.201
66.249.73.203
66.249.73.212
66.249.73.214
66.249.73.216
66.249.73.220
66.249.73.221
Bad ip’s - never reach my site. But good - alwasy. After request from bad - same second goes request of same page from other ip. And it reaches site and is in access.log
I think there are problems and there should be a lot of people here with same problem
p.s.
I’ve started to recieve crawl errors in google console on january 26
I’ve created ticked 2 monthes ago, nothing was solved.
Now i’ve created it again, no answer yet.
If there are problems more people have it. So let’s just wait.
Thing is when i found those “bad” google ips my thaught was “ok, those are my crawl anomalies”. No. Thing is there are those bad ips they are not proxied by cloudflare.
And those crawl errors in google console - completely different errors.
I created all logs of google requesting my site through firewall. So i can now cross check google logs and cloudflare logs. Without my server interfering at all. Thing is google sais at this particular time he requested this particular page. And in cloudflare logs there is not google accessing this page not at this time, not at any other time. Network error between google and cloud flare.
Second thing that should be investigated - those bad google ips that never reach site. They reach cloudflare. And then same request goes from different ip and finally reaches site. All those things need to checked from cloudflare personel with access. The only reason why this post is here - there must be more people with same problems.
Starting january i’ve started to recieve “crawl anomaly” errors in google console. I’ve found a way to create a log of all googlebot access not to my site - to Cloudflare before it’s proxied to my site. So u go in my firewall rules. There filter rule either “allow” or “bypass” - those both rules that log all “known bots”. There u make additional filter
User agent contains Googlebot/
And that’s the access log of googlebot to Cloudflare. This i’ve crosschecked with google crawl anomaly log and found that there were no requests from google logged. Either it’s error on google side. Or it was filtered by Cloudflare before it reached firewall .
https://*.com/board/showthread.php?t=473947
Last crawled on 20 Apr 2020, 13:48:30
https://*.com/board/showthread.php?t=241398
Last crawled on 20 Apr 2020, 13:35:47
https://*.com/board/showthread.php?t=180051
Last crawled on 20 Apr 2020, 13:35:43
https://*.com/board/showthread.php?t=243516
Last crawled on 20 Apr 2020, 13:28:00
Those are last crawl error logs from google console, gmt +3. but you can simply search querry string with ids from urls. You will find that googlebot never accessed Cloudflare.
Why all this? Because there is no possibility my server is doing something wrong. I read not logs of server - but logs of Cloudflare before server.
There is just strange second issue i found analyzing Cloudflare firewall logs of googlebot accessing my site. I found that therea good google ips - they alwasy reach site. And there are “bad”. It looks like bad is getting error. And same moment alwasy same page is scanned by other google ip. That second is in logs, first - not. So first reaches Cloudflare but is not proxied to site.
This second issue is not in google console error logs.
good
66.249.73.193
66.249.73.198
66.249.73.206
66.249.73.208
66.249.73.210
66.249.73.222
66.249.73.223
bad
66.249.73.199
66.249.73.201
66.249.73.203
66.249.73.212
66.249.73.214
66.249.73.216
66.249.73.220
66.249.73.221
It’s very simple to look at it. Ip always change. but if u just look at logs u will see strange lines where 2 requests go at same time. LIke this
23 Apr, 2020 14:18:00
Bypass
United States
66.249.73.180
Firewall rules
23 Apr, 2020 14:18:00
Bypass
United States
66.249.73.184
Firewall rules
Second one checks same page as first one and is in logs, it reached server. But first one failed. I am just highlighting it. Maybe the first one is not supposed to be proxied.