Blocking Google bot

Hello, i have ready a number of complaints about blocking google bot but Cloudflare always says it is not.

I am trying to get google to crawl my site but i have failed.

I cannot even submit a sitemap in the search console. When i try to submit a site map, search console returns HTTP Error: 403

I believe it is Cloudflare blocking access because the url https://ugwatch.com/sitemap-2.xml is accessible in the brwser but can’t be fetched in search console.

When i checked with google, they assured it can only be CF affecting the process

How can i avoid CF interfearing with Google Bot.

I have now temperarily restored my name servers to the default domain provider (removed CF name servers) and now the site map can be fetched.

If Cloudflare was blocking this, you’d see it in the Firewall Events Log.

Have you read through similar posts?
https://community.cloudflare.com/search?q=google%20403

Hello.

I have beentrying to read firewall events and this is what i got:
It seems it blocked a few wrong bots but not google bot…
A, now confused on what the matter could be because as noted earlier the sitemap was only able to be sybmitted after temperarily disabling CF hostnames

{

  "result": [
    {
      "kind": "firewall",
      "source": "securityLevel",
      "action": "challenge",
      "rule_id": "torfallback",
      "ip": "213.139.206.28",
      "ip_class": "tor",
      "country": "T1",
      "colo": "LHR",
      "host": "ugwatch.com",
      "method": "GET",
      "proto": "HTTP/2",
      "scheme": "https",
      "ua": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3563.0 Safari/537.36",
      "uri": "/",
      "matches": [
        {
          "rule_id": "torfallback",
          "source": "securityLevel",
          "action": "challenge"
        }
      ],
      "occurred_at": "2020-06-11T21:54:37Z"
    },
    {
      "kind": "firewall",
      "source": "securityLevel",
      "action": "challenge",
      "rule_id": "badscore",
      "ip": "23.237.4.26",
      "ip_class": "badHost",
      "country": "US",
      "colo": "DFW",
      "host": "ugwatch.com",
      "method": "GET",
      "proto": "HTTP/1.0",
      "scheme": "https",
      "ua": "Mozilla/5.0 (compatible; AlphaBot/3.2; +http://alphaseobot.com/bot.html)",
      "uri": "/",
      "matches": [
        {
          "rule_id": "badscore",
          "source": "securityLevel",
          "action": "challenge"
        }
      ],
      "occurred_at": "2020-06-11T08:36:07Z"
    },
    {
      "kind": "firewall",
      "source": "securityLevel",
      "action": "challenge",
      "rule_id": "torfallback",
      "ip": "185.220.101.46",
      "ip_class": "tor",
      "country": "T1",
      "colo": "HAM",
      "host": "ugwatch.com",
      "method": "GET",
      "proto": "HTTP/1.1",
      "scheme": "https",
      "ua": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36",
      "uri": "/",
      "matches": [
        {
          "rule_id": "torfallback",
          "source": "securityLevel",
          "action": "challenge"
        }
      ],
      "occurred_at": "2020-06-10T22:51:45Z"
    },
    {
      "kind": "firewall",
      "source": "securityLevel",
      "action": "challenge",
      "rule_id": "badscore",
      "ip": "23.237.4.26",
      "ip_class": "badHost",
      "country": "US",
      "colo": "DFW",
      "host": "ugwatch.com",
      "method": "GET",
      "proto": "HTTP/1.0",
      "scheme": "https",
      "ua": "Mozilla/5.0 (compatible; AlphaBot/3.2; +http://alphaseobot.com/bot.html)",
      "uri": "/",
      "matches": [
        {
          "rule_id": "badscore",
          "source": "securityLevel",
          "action": "challenge"
        }
      ],
      "occurred_at": "2020-06-10T12:09:34Z"
    }
  ],
  "result_info": {
    "cursors": {
    },
    "scanned_range": {
      "since": "2020-06-10 12:09:51",
      "until": "2020-06-11 21:54:42"
    }
  },
  "success": true,
  "errors": [],
  "messages": []
}

Yes i have checked similar posts. One 403 Error - Google Search Console - #8 by user7872 had the same exact similar issue.

He too couldn’t get google to crawl his site and also couldn’t submit the sitemap. Also when he disabled CF name servers google was able to crawl and he was able to fetch the sitemap.

Also to note is that other knwn bots seem to crawl succeessfully.

unfortunately he did not get a solution and he just had to give up using CF.

I really hope we resolve this coz i wanna use CF. I have already set some logic in my site that depends on CF rules.

Also to test further, i have just tried adding a new site to CF schoolskipper.org.

As soon as i added it to CF, i was not able to submit a sitemap to CF.
i started to return 403 error.

I get the feeling some hosts block certain types of traffic that come through Cloudflare. I frequently get 403 errors when I ‘curl’ a site when troubleshooting, but it loads fine in Firefox.

If the 403 isn’t showing up in Cloudflare’s Firewall Events Log, then I suspect the 403 is coming from your host. Have you checked your server logs for this?

I have just checked the server logs.
They actually show 404 error not 403 on Cloudflare traffic trying to access sitemap.xml. I expected it to be 403 as google domains reports.

197.239.7.8 - - [10/Jun/2020:16:25:57 +0200] "GET /sitemap.xml HTTP/1.1" 200 8637 "https://translatedmovies.com/sitemap.xml" "PostmanRuntime/7.25.0"
172.68.186.101 - - [10/Jun/2020:22:10:46 +0200] "GET /sitemap.xml HTTP/1.1" 200 7827 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
172.69.70.116 - - [10/Jun/2020:22:10:50 +0200] "GET /sitemap.xml HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
197.234.242.85 - - [10/Jun/2020:23:35:18 +0200] "GET /sitemap.xml HTTP/1.1" 200 7827 "https://search.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
172.69.70.116 - - [10/Jun/2020:23:35:35 +0200] "GET /sitemap.xml HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
172.69.70.116 - - [11/Jun/2020:02:40:33 +0200] "GET /sitemap.xml HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
172.69.70.116 - - [11/Jun/2020:07:37:50 +0200] "GET /sitemap.xml HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
162.158.159.61 - - [11/Jun/2020:08:30:24 +0200] "GET /sitemap.xml HTTP/1.1" 200 7827 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
162.158.159.61 - - [11/Jun/2020:08:30:28 +0200] "GET /sitemap.xml HTTP/1.1" 304 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
162.158.187.194 - - [11/Jun/2020:23:26:02 +0200] "GET /sitemap.xml HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
162.158.107.210 - - [12/Jun/2020:00:53:13 +0200] "GET /sitemap.xml HTTP/1.1" 200 7827 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Is it getting served from cloudflare?

I believe so because blocked traffic show 172.69.70.116 which looks like CF IP

Oh but currently i have removed CF nameservers so it’s currently not using CF

The hosting server logs a 404 for the sitemap when tried to get access by google bot… not sure why

waakobrian,

I have an idea. Create a firewall rule. IN that please allow all known bots.

After an hour please see logs related to that firewall rule and check do you see googlebot has 404 issue in cloudflare dashboard?

Hello,

i tried this but still search console cannot fetch my site map.

Am now testing with schoolskipper.org…it’s the one with CF name servers.

But since i have read so many of such cases and it all ends by cients giving up with CF, shouldn’t CF take the initiative to find out what exactlu happens in these cases??
Because everything works fine without CF and when u introduce CF google bot stopps working.
Our web hosting providers say they do not block and CF or google bot traffic…

I think CF should be at the centre of finding out what the problem is instead of just neglecting the hundreds of people with this problem yet it only comes with CF

can you please share screenshot of cloudflare dashboard showing 404 for sitemap.xml?

That’s a huge problem, and may be key in fixing this problem. See if your host can figure out the 404. That would be ironic if the host’s 404 for a sitemap presented a 403 to Google for some unknown reason.

What would be helpful would be if Google could show you the entire result from the “Couldn’t Fetch” attempt, including the HTML that came back with the error. That should show if it’s a Cloudflare 403 page, or one from your server.

Some kind of Bot protection on the Origin that is not aware it’s behind a proxy? Blocks GoogleBot when the upstream IP (in this case Cloudflare) is not a known Google IP.

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.