I’m not sure what the best place to put this is, but for the last few days, I’ve noticed that a significant number of websites that utilize Cloudflare are having issues on several Utah ISPs (and perhaps other locations).
I opened a support ticket for this issue but was told there’s nothing that can be done on Cloudflare’s side since they can’t reproduce the issue internally and everything looks good to them, so it must be my network. With that said, after asking around, I’ve had several friends and clients confirm they’re also experiencing this issue. I figure it can’t just be me at this point, so I thought I’d post here to get some guidance (and hopefully get the attention of someone at Cloudflare).
After looking into the issue a bit, it seems that the DNS responses for these broken websites contain three Cloudflare IP addresses. Two of the IP addresses accept TCP connections on ports 80 and 443, but the third doesn’t. Whenever one of the IP addresses doesn’t accept TCP connections, it happens to begin with 172.67. (Although not all 22.214.171.124/16 IP addresses are exhibiting this issue.)
Because one of the three IP addresses may not work, this leads to incredibly poor performance in modern browsers when the broken IP is the first record. In the case of Chrome on macOS, the browser will spend 75 seconds trying to connect to the first IP before moving onto the second one. And because Chrome only caches this failover for around a minute, that means most affected websites are essentially unusable. Firefox has a similar timeout. Safari seems to give up much quicker based on a dynamic value, which in my scenarios, was around 300 ms.
The official Git website happens to be affected, so I’ll use that as an example. First, get the IP addresses of the affected website:
$ dig git-scm.com git-scm.com. 269 IN A 126.96.36.199 git-scm.com. 269 IN A 188.8.131.52 git-scm.com. 269 IN A 184.108.40.206
Notice that one of the IP addresses begins with 172.67. A quick way to check if the ISP you’re using is affected is to navigate to
http://<ip address>/. If your connection is affected, the request will timeout when trying to connect to the IP address beginning with 172.67 (so in this case, http://220.127.116.11). If you’re not affected, the request will show a 1003 error since we’re trying to access the IP address directly without sending Cloudflare the host (which is to be expected).
Alternatively, here’s an example of using curl on command line to verify this. If you try to connect to the Git website using the broken IP, the connection will time out after around 75 seconds.
$ curl -Lvso /dev/null https://git-scm.com --connect-to ::18.104.22.168 * Connecting to hostname: 22.214.171.124 * Trying 126.96.36.199... * TCP_NODELAY set * Connection failed * connect to 188.8.131.52 port 443 failed: Operation timed out * Failed to connect to 184.108.40.206 port 443: Operation timed out * Closing connection 0
If you try that same request using one of the other IP addresses, everything works fine:
$ curl -Lvso /dev/null https://git-scm.com --connect-to ::220.127.116.11 * Connecting to hostname: 18.104.22.168 * Trying 22.214.171.124... * TCP_NODELAY set * Connected to 126.96.36.199 (188.8.131.52) port 443 (#0) ... < HTTP/2 200 ... * Connection #0 to host 184.108.40.206 left intact * Closing connection 0
Can anyone in Utah (or anywhere really) confirm whether they’re seeing similar results? Or does anyone have advice on what I can try next?
I’ve contacted my ISP and they are seeing the issue as well, so they said they’d try to get in touch with Cloudflare to resolve the issue. They’re a smaller ISP though, so they told me they think I have a better chance of getting it fixed since I use Cloudflare for my own websites (some of which are also affected).