HTTP 524 for a domain behind Cloudflare tunnel only for certain users hitting certain cloudflare datacenter/POPs.
Preamble: I’ve checked the docs mentioning the 100 second timeout limit for proxied domains and recommendations to check the backend response times when a HTTP 524 is returned by cloudflare.
I had submitted a ticket where the autoclose bot mentioned waiting for a few days on a community forum post before escalating.
We have a subdomain yyy.domain.tld served by a Cloudflare/Argo tunnel, and see 50%+ requests fail with a HTTP 524 timeout only when routing via certain cloudflare datacenter/POP - SIN (and possibly BLR). Testing across other clients shows requests working fine (returning <3s) for users going via HYD or BOM (eg cf-ray 664cd597afcf4aff-HYD), and no issues are seen on the backend server(s). However users physically at BOM (and some other users on various residential ISPs) consistently report cf-ray reporting requests going via SIN and seeing HTTP 524 (eg 668715abfcbd2f5b-SIN) .
I don’t see any incidents on the cloudflare status page for SIN though. This has been occuring for nearly a week
This is impacting 150+ team users and we’d like to see if cloudflare can investigate and give us more info, including if a user physically in BOM (network-wise as well) going via SIN or HYD instead of BOM is expected or not, and if we can investigate rerouting or issues at the cloudflare layer if SIN/BLR is facing issues that BOM/HYD are not, in reaching our application.
Thank you so much for your quick reply, we need urgent help from your side to resolve our current issue, can we have quick call to discuss on the same.
Hello!
As I can see the issue, is unrelated with Cloudflare tunnel 524 to a specific colo.
Checking at the logs, the host reporting 524 errors on multiple geolocations.
I run tests on multiple colos and I can verify here the 524, however that seems to be an issue on the origin side.
We will see employees in the midwest where our corporate office is see 524s but see no issues with origins. The problem seems to then grow out of control from there affecting any page load for the employee.
We’ve learned that if you restart the cloudflared service, the problem resolves. And if the restart takes more than a second or two, then you’ve identified the issue service that is clogged up.
We’re currently monitoring 524 responses and when they start to grow, then we restart the cloudflared service. We have not other way of identifying why this is happening.
@Dr.R0b0tn1k hi! Can you please create a new thread with your issue and add all details in this new thread?
As I can see, an engineer already replied on the ticket #2171020 but you never went back to us, with sharing your logs. And on ticket #2173805 and engineer directly answered the ticket with details. Without having your cloudflared.log that would be hard to shed some light on your issues.
Please create another thread, or create a new ticket and share with us
cloudflared logs
config.yml configurations
command that used to create and route your tunnel.