When I disable the DNS proxy and access my website via HTTP, I never get any hung connections.
When I curl from my personal laptop to the domain, everything works every time.
When I curl from a pod in my GKE cluster via the Cloudflare domain, ~50% of requests hang completely, eventually returning after ~2 minutes.
This issue started within the last 48 hours and is also reflected in the Google Cloud uptime checks for my domain.
It seems like the issue is something with requests originating from Google Cloud spuriously timing out. Is there a WAF rule I can disable (I have none), or some other way I can allow traffic from Google without yielding these super long delays?
I realize now the issue is more specific, when resolving my domain, I get two IPs:
Non-authoritative answer:
Server: UnKnown
Address: 192.168.86.1
Name: XXX.YYY
Addresses: 104.21.59.47
172.67.213.204
Google Cloud fails 100% of the time when attempting to connect to the 104.* IP, and works 100% of the time via the 172 IP, neither of these IPs are in my control, is there a way to remove 104 from being a candidate for my domain?
To provide additional evidence, here’s the latency uptime check from the perspective of Google’s health checks, we have multiple domains, and they all started failing at the exact same time, which furthers the theory that something broke independently of our infrastructure as we did not release any changes during this time window.