I have been having issues getting googlebot to crawl my main webpage url.
It started when I noticed that my wordpress site has no description in google search results (not sure when it stopped working, no issues on bing and yahoo though)
Google search console shows that the main page url has been indexed without content and trying live tests on the url results in failure. Strangely, the other pages of my website do get crawled without issue, its just the main url that is having problems.
Tried optimizing the site on advise by google help forums but it was no help, so I tried disabling Cloudflare to see if something in there was causing the issue and well googlebot was able to crawl without issue when Cloudflare is disabled.
Since I’m using Cloudflare to serve my SSL, leaving it disabled is not an option so I re-enabled it and added a rule to allow known bots through thinking it should fix my problems but, although its not as bad as before, it still fails the crawl half the time.
I have triggered another page validation on google search console but don’t have high hopes it would get lucky and manage to crawl the page without issue to solve the SEO problem so any advice I can get here would be most helpful.
This is an issue that is close to impossible to reproduce. But is there any pattern in the positive vs negative results?
Intermittent results can be caused by the origin server slowing down after too many requests, returning 4xx and 5xx errors. Googlebot, despite its claims of having sophisticated algorithms, can be a voracious reader at times. You may wanna try to instruct Googlebot to limit its crawl rate for your site, see if that solves the problem.
The Google support page I linked to above shows you how to do it, and links to the proper tool (the legacy GSC), in case you wanna give it a shot.
I have managed, however, to see the blank page you’re talking about, by changing the User Agent in Chrome to “Googlebot mobile crawler”. Not only in the home page, but also in the About Us page, I saw nothing but a white screen as a result.
The page is not being redirected, but instead it returns a HTTP 200 status code, all the headers of a normal page, but without any content. Cloudflare does not have any feature that, out of the box, would result in a blank page. Cloudflare security features return an error page, or a captcha. You’d have to use Workers to get the blank page.
In all likelihood this is something that’s being done at your origin. Did you set any firewall plugin? It may have a bug that is making the legitimate Googlebot receive a blank page that perhaps was meant for Googlebot impersonators.