Sitemap not crawling correctly by Google - Error 403

Hey there,

I have an issue related with Cloudflare: Google cannot crawling correctly part of my sitemap.
The address of the part of the sitemap that has issue is the following: https://www.yogicyantra.com/product-sitemap.xml

Even if a normal user can see it, Google Bot cannot. This is the response that the google bot get if it tries to index that part of the sitemap:

Can someone help me? That would be really appreciated :slight_smile:

Ah, by the way, this is the search console tab showing the error mentioned above:

For starters, does your server IP address end in 119?

No Sandro, it should ends with 159

All right, would you feel comfortable sharing it here (also just temporarily)? Otherwise run a request with the IP at sitemeer.com and post back here the time when you ran it and I will dig it out.

So far it seems you have somewhere a configuration which prevents user agents with “bot” in them from accessing your site, but thats all I can say with the information I have so far.

By the way, thank you very much :slight_smile:

Got the IP address. You can remove the posting if you wish.

Yes, your server immediately closes the connection when it receives a request with “bot” in the user agent. You will need to check your server configuration for that directive and remove it, respectively at least exclude Google, though if you want to keep this configuration, an IP based whitelist (to exclude Google) is probably the better approach as - otherwise - anybody impersonating Google could pass as well.

Also - though unrelated to your issue - your server certificate has expired, hence you can only use “Full” as SSL mode on Cloudflare and not “Full strict”. The latter would be more secure and all you need to do is renew your certificate.

Thank you very much Sandro, I will explore this issue with my server provider.
Thank you! :slight_smile:

1 Like

Should they not believe you, you can easily demonstrate it with the following cURL call. Replace HEREGOESYOURACTUALIP with your IP address

curl -vk -H "User-agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" --resolve www.yogicyantra.com:443:HEREGOESYOURACTUALIP https://www.yogicyantra.com/product-sitemap.xml

Actually you dont even need the Googlebot user agent, any bot-ish user agent will do.

Hey Sandro,

i contacted the server, and it seems that that problem was caused by a security plugin on my website. The plugin is called iThemes Security, and was banning the IP of googlebots because they were hitting too many 404.

Thanks to your suggestions I was able to find it out! :slight_smile:

Thank you again!

This topic was automatically closed after 14 days. New replies are no longer allowed.