Occasional 504 error, most definitely no issues at our end


#1

Hello,

We are using a load balancer with two hosts. Our hosts are perfectly healthy, there are no error messages at our end, and all requests complete in a time under 300ms. When having a web browser open with the endpoints directly, there are no errors, everything works smoothly (servers in Germany, accessing from Vienna).

However, when going through Cloudflare, we occasionally get 504 errors, always with the timeout of 10 seconds.

Here is some sample data from the specific requests that failed:
Ray ID: 43e9b5564f1159d2 2018-07-22 23:32:55 UTC
Ray ID: 43ed9078bd7659d2 2018-07-23 10:46:48 UTC
Ray ID: 43ed95d6f9e8597e 2018-07-23 10:50:28 UTC
Ray ID: 43ed99048f38597e 2018-07-23 10:52:38 UTC

The requests occur every couple of minutes, sometimes more frequent.

I’m pretty sure that we are not doing anything wrong at our end: none of these errors are registering at our end, and following exactly the same workflow while bypassing Cloudflare never fails to connect. It looks like Cloudflare occasionally fails to establish a TCP connection and times out with HTTP 504 for reasons, that are unclear to us, and I can only think that something is wrong at the Cloudflare’s end.

What’s interesting, we have two domains in our account, and it doesn’t happen with the second domain, which has an identical configuration and is routed to the same servers.

Please help.

Thanks,
George.


#2

Nobody else has responded, so I’ll give it a try.

Are you getting the pretty Cloudflare error screen? The Support docs say it’s getting the 504 from your server, but you say there are no error messages. I know you said you have two domains configured the same way and one is ok.

Could there be a firewall that’s triggering and blocking access?

Is this on the busier of the two domains?


#3

Hi sdayman,

Thanks for the hints, but I think we looked everywhere already, including those pages and Google. There’s barely any load on the domains, and I checked it with exactly the same tests from the same geographical location while bypassing Cloudflare - we experienced no issues. Both domains are served from the same servers by the same nginx instance. I’d be willing to post any config files on request and provide more information.

I suspect, the problem could be in the routing between Cloudflare and our servers that affects some of their nodes, but not the others. We are probably not the only company having a similar problem - so maybe somebody with the access to the Ray ID logs would have a better insight on what’s going on. We are pretty confident that it’s something non-trivial, and, provided that it leads to a solution, we are willing to pay some reasonable hourly rate for the engineers’ investigation of this problem if required.

Our servers did not register any errors, and the connections that returned 504 at the client side didn’t even reach the server (nothing in the logs for those timestamps). The server certainly was alive thoughout the entire time without even small gaps. We even tested it in parallel with Cloudflare by connecting another browser window directly, and only the Cloudflare instance of the web app registered errors, while the other one was completely free of errors.

The errors always come after 10 seconds (with a few milliseconds variance), and the requests are not registered at the server’s end. If it were real timeouts, I’d expect at least the incoming requests to register in the nginx logs, or some long-playing requests showing in the non-Cloudflare browser window that exceed 10 seconds, and neither were visible. So the only thing I can think of is unreliable routing from some Cloudflare nodes.

Thanks,
George.


#4

Those Ray ID entries are certainly useful. Have you opened a Support ticket? You can email your findings to: support AT cloudflare DOT com


#5

Great, I’ll mail the summary to the support address.


#6

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.