I’m receiving Intermittent 524 origin timeout errors on ~1% of requests, even though my application logs that it did send a response well within the 100 seconds timeout.
client call (curl):
curl -I https://****/someurl
date: Wed, 27 May 2020 23:25:12 GMT
HttpOnly; SameSite=Lax; Secure
cache-control: no-store, no-cache
expect-ct: max-age=604800, report-uri=“https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct”
My server log shows a 200 response after 43 seconds:
May 27 23:24:16 nginx default[api-dosfo-5bb79d4d48-22b8j] 10.244.8.59 - - [27/May/2020:23:24:16 +0000] “GET /someurl HTTP/1.1” 200 30499 “-” “curl/7.64.1” “2a01:4b00:864d:7a00:2cc4:5fe7:f6ac:9313” “59a37bf77d02002a-LHR” “GB” 44.031 43.796 : 0.228 .
Other ray-id’s with same issue (client gets a 524, but server logs say that it processed the request before the timeout):
What could be the issue here?
We’re running digitalocean kubernetes behind cloudflare, and only getting these issues when cloudflare proxy is enabled.
It’s very difficult for me at the moment to recreate the issue on demand, as I don’t know what’s causing it - there doesn’t seem to be any pattern to the errors, and it’s only affecting 1% of requests.
The request flow is cloudflare => digitalocean load balancer => kubernetes haproxy ingress. The log above is from a kubernetes pod on the “edge” of our cluster. The cluster is using cilium and core-dns.
If it is a networking issue such as packet loss between cloudflare and the origin, how is it possible to diagnose further?
Thanks for any help!