Googlebot returning 50x errors in Webmaster Console

Ever since I started using Cloudflare’s load balancer I am seeing Server error (5xx) from Googlebot (around 7000 pages) - it’s mostly category and tag pages but there’s also a few dozen article pages and every time one of these errors is found by Googlebot- it deindexes the page in Google search.

This is obviously a huge problem for me and it’s especially concerning because I am unable to recreate the error from my own testing- I think it may be an issue with the Cloudflare load balancer throttling Google’s bot traffic. Here’s the thing- I’m still on the ‘Free’ Cloudflare plan which doesn’t even come with any customer support even though I’m paying $20/ month for the load balancer service.

I have already planned to upgrade to the Pro version of Cloudflare for it’s bot management functionality, which is clearly not existing at all in the custom rules section of the load balancer, but I will likely be able to start debugging this issue more clearly once I have the Pro Cloudflare package.

Nevertheless I would appreciate any advice- the website is running WordPress and there’s only two endpoints each one running a Kubernetes cluster with multiple web servers serving the site with an Nginx controller in front on each cluster that serves as the endpoint that Cloudflare talks to.

In terms of traffic request limits I have this setup on the controllers:
limit-burst: “2500
limit-rpm: “800”

But since proxied is disabled, we should be seeing the visitors IP address not Cloudflare’s, otherwise I would guess the issue would simply be Cloudflare IP’s getting blocked from the limits setup here.

I might double or triple these numbers just for the next few days as a temporary measure on the off chance it’s causing the issue- otherwise I think the issue lies elsewhere. Aside from general advice would also appreciate any tips on good testing services for this issue, I’m not able to spend huge money but if there’s a service out there for something like $10-30/ month that would work.

Just to provide an update, I have discovered the errors are from at least one of my endpoints from my own testing:

:~$ ./googlebot-test.sh

Successful requests: 423
502 errors: 40
404 errors: 0
Other errors: 578

I also noticed the ‘last applied configuration’ in the Kubernetes ingress was a much lower number than the number I added to the ingress config. So it’s possible it just didn’t update automatically. I have since manually edited the last applied config (in the json block). And I’m going to redo this test.

But I’m guessing it’s either this or ModSecurity enabled on the ingress that are the most likely culprits. further testing is required on my part now. But it’s definitely not Cloudflare!

The ‘other errors’ is actually mostly 301 and 302 responses so not really errors, just wanted to clarify that. I am adding improvements to my bash script to more clearly specific the responses now.

Hi there,

You’re using a load balancer rule with poxified fields in an unproxied domain (DNS only).
This might not be the only issue, but it certainly doesn’t help.

Consider orange clouding the apex and www in order to be able to use Layer 7 load balancer. This option will bring many more features to the table.
Read more about it here

Take care.

I had layer 7 proxy off because it was slower than dns load balancer.

But I’ll retest again in future.

The rules seemed to work fine with dns only routing, are you sure Cloudflare isn’t still able to read URI and reroute requests without proxy enabled?

Some fields it can, some can’t.

This might be a configuration issue.
A L7 load balancer is better and more reliable as it doesn’t depend on resolvers on the client side. For instance, if I just visited your non proxied website and got directed to origin A 5 minutes ago, and now I try visiting again, I will going on the same origin A and the load balancer won’t even be consulted because my system already knows that your origin DNS points to X IP and for the most of systems, it won’t be checked again that often. this means, that after I visited your website the 1st time, until my system decides to recheck DNS, I will always be redirected to that same origin.
Plus L7 allows you to use cache, WAF, and other features that will ultimately make your website faster and more secure while reducing even more the load on your origin.

I worked hard on my site performance I can almost guarantee its faster than Cloudflare’s cache. I have 96 end points over two load balancers, served by a couple of Kubernetes nodes.

That’s why I run my own WAF and not rely on Cloudflare or other such L7 services.

Cloudflare has millions of clients so the latency goes up quite a lot on that shared infrastructure. I’m sure the big corporate clients get a priority on your networks too.

I also moved off cloud providers like aws because they charge a lot of money vs just renting dedicated servers, something along the lines of 5-10x and some underhanded overselling tactics once they think they have you on their long term plans, I mean they squeeze the resources.

So in my experience dealing with cloud services for small companies like us we either overpay or we get a bad service.

I’m getting good performance with the dns only load balancer from Cloudflare right now. So I’ll keep it but will test L7 at some point in the future.

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.