520 Errors ~15% of the time on all domains

I’ve been using Cloudflare for years and it’s never given me a problem until now. For the past month and a half, I’ve seen a slow progression in severity/commonness of a 520 error showing up randomly when going to any of my sites. There is no rhyme or reason, it just happens and upon refreshing the page it is gone. If you sit there and spam the refresh button, you’ll see it happen about 15-20% of the time.

I thought it might be something on my end as all of the troubleshooting information suggests, but the following tests have shown me that it is in fact a problem when going through Cloudflare’s proxy and that’s it.

  1. If I revert the nameservers for 1 of the affected domains, the problem goes away. Works 100% of the time.
  2. If I point my desktop to the web server directly using my hosts file, the problem goes away. Works 100% of the time.
  3. If I point an external server to my WAN IP directly using the hosts file, the problem goes away. Works 100% of the time.

The only time the issue arises is when the nameservers point to Cloudflare, and Cloudflare is the proxy. I have tried the following in Cloudflare to try and isolate settings with nothing resolving the issue:

  1. Clear Cloudflare website cache
  2. Turn on developer mode
  3. Turn off HTTP/2/3 to the Origin
  4. Different options of SSL modes. It is set to Full (Strict).

The only thing that I have found that works is setting it to DNS Only instead of having Cloudflare proxy the traffic or pausing Cloudflare on the domain.

Some details about my environment. I run a Kubernetes cluster where my websites are run. In front of this is a Traefik instance that uses Cert Manager to get and manage certs from LetsEncrypt. All 443 traffic comes through my UDM Pro, which all gets forwarded directly to Traefik for processing.

This is extremely frustrating and I’ve been digging into this and troubleshooting for about a week straight now with no luck in resolving this. Sometimes it will seem like what I did helps, and then it starts popping up with no further changes.

Ray ID: 73b423809e9d18aa

1 Like

I’ve done all the troubleshooting and suggestions mentioned elsewhere on the forums and other places on the internet. It is not an issue with the origin server. It is an issue with Cloudflare’s proxy or configuration of it, none of which has really changed since before the issue started happening.

Greetings,

Thank you for asking.

I am sorry to hear you are experiencing an issue.

All the websites are on the same server? :thinking:
No firewall?
Have you tried using a different hosting/server provider with the same or similar setup?
How about not using that kind of a setup for one website? Does it work then? If so, then I am afraid it’s not Cloudflare, rather the origin/configuration how it “proxies” things and requests (behind, or not firewall with SSL) of your services you mentioned you’re using.

Reminds me a bit, I remember I had issues with Imunify360’s webshield SSL proxy manager. I had to disable that feature and then it all worked fine because it hadn’t had SSL certificate itself despite Nginx, furthermore it did some proxying in between from one port to another, then to Nginx, and vice-versa to get out, which got me 520 and/or 526 errors.
Even if I did entered the SSL certs for websites (manually as it is a case), still not the best solution. So, I disabled it and all works fine.

And you’ve allowed Cloudflare IPs and returning the real visitor IP at the log files at your … proxy & origin? :thinking:

Kindly, I’d suggest you to write a ticket to Cloudflare support due to your account and/or domain issue and share the ticket number here with us so we could escalate this issue:

  • Login to Cloudflare and then contact Cloudflare Support by clicking on the Get More Help button. If you get automatic reply, reply and indicate to it you need more help and reference to this topic
  • Or send an an e-mail to support[at]cloudflare[dot]com from your e-mail associated with your Cloudflare account

Furthermore, if you have been through all these above suggestion and are not seeing corresponding issues on your network/server and you have a ticket number with Cloudflare, please reply and post that ticket number # here.

To enable efficient troubleshooting by support, please ensure you include the following on the ticket:

  • example URL(s) where you are seeing the error
  • Ray IDs from the 520 pages
  • output from a traceroute from any impacted user
  • output of example.com/cdn-cgi/trace - replace example.com with the affected domain.
  • Also include two HAR file(s) : one detailing your request with Cloudflare enabled on your website and the other with Cloudflare temporarily disabled - see How do I temporarily deactivate Cloudflare

I have a 12 node Kubernetes cluster, 9 of which can have workloads on them, so it could be on any of those physical nodes. But yes, they are in the same “system.”

I have a UniFi UDM Pro as my main router/firewall. The only port open/forwarded on it is 443, and all traffic to that port goes to the Traefik proxy IP which terminates SSL and sends it to the appropriate service.

I am self-hosting everything, which hasn’t been a problem for years, so I am unsure why it is suddenly an issue without any fundamental changes.

I will give it a try and report back.

Yes, I originally only allowed Cloudflare IP addresses through my firewall, but for the sake of testing, I am now allowing all 443 traffic in.

I’ll submit that now, I figured this might get a quicker response after hearing how long you have to wait when you only have a free tier account. Which reminds me, another post I saw regarding this same issue went away after upgrading to the Pro plan and returns after their pro membership went back to free:

Exactly the same thing happened to me and I tried all the existing recommendations in official and unofficial forums. The problem is something from Cloudflare, try to try another hosting and the error continues.
A shame because the only option left is to disable it.

If anyone has any other solutions or answers from Cloudflare, please share!

1 Like

The same thing happens to me since last month, I had to delete all my sites from Cloudflare and go back to my normal dns, everything works fine now, :frowning_face: very sad to have to leave Cloudflare after many years, I think the problem started when they changed the new way human verification.

1 Like

That’s unfortunate. I ended up paying for the pro plan on one of my domains and have had a ticket open for almost 4 days now with no response yet. So much for 4-hour response times…

The service level objective (SLO) on the Pro plan is 5 days, not 4 hours. The business plan, which starts at ten times the rate of the Pro plan, doesn’t include a 4 hour response for even the highest severity issues.

I know that there is some confusing language somewhere on the Cloudflare site that references responses occuring faster than the SLOs and SLAs, but I suspect that takes into account the rapid responses one can recieve here in the Community.

I went to the pricing section on the main website and noticed the “initial response time” to tickets, but when you go to the domain in the dashboard and click to change plan, this is what I was referring to, which comes across a little misleading:

You are not wrong about that. As an outsider, I don’t how that figure is derived. If my opinion were to be solicited on whether that language should be changed, my answer would be an unequivocal “Yes”.

You have my sympathies for your current circumstance, as inconsistent errors are the worst to troubleshoot.

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.