Many of Cloudflare's 172.67.0.0/16 IP addresses won't accept HTTP/HTTPS connections when routed to SLC

I’m not sure what the best place to put this is, but for the last few days, I’ve noticed that a significant number of websites that utilize Cloudflare are having issues on several Utah ISPs (and perhaps other locations).

I opened a support ticket for this issue but was told there’s nothing that can be done on Cloudflare’s side since they can’t reproduce the issue internally and everything looks good to them, so it must be my network. With that said, after asking around, I’ve had several friends and clients confirm they’re also experiencing this issue. I figure it can’t just be me at this point, so I thought I’d post here to get some guidance (and hopefully get the attention of someone at Cloudflare).

After looking into the issue a bit, it seems that the DNS responses for these broken websites contain three Cloudflare IP addresses. Two of the IP addresses accept TCP connections on ports 80 and 443, but the third doesn’t. Whenever one of the IP addresses doesn’t accept TCP connections, it happens to begin with 172.67. (Although not all 172.67.0.0/16 IP addresses are exhibiting this issue.)

Because one of the three IP addresses may not work, this leads to incredibly poor performance in modern browsers when the broken IP is the first record. In the case of Chrome on macOS, the browser will spend 75 seconds trying to connect to the first IP before moving onto the second one. And because Chrome only caches this failover for around a minute, that means most affected websites are essentially unusable. Firefox has a similar timeout. Safari seems to give up much quicker based on a dynamic value, which in my scenarios, was around 300 ms.

The official Git website happens to be affected, so I’ll use that as an example. First, get the IP addresses of the affected website:

$ dig git-scm.com
git-scm.com.		269	IN	A	104.22.3.43
git-scm.com.		269	IN	A	172.67.12.172
git-scm.com.		269	IN	A	104.22.2.43

Notice that one of the IP addresses begins with 172.67. A quick way to check if the ISP you’re using is affected is to navigate to http://<ip address>/. If your connection is affected, the request will timeout when trying to connect to the IP address beginning with 172.67 (so in this case, http://172.67.12.172). If you’re not affected, the request will show a 1003 error since we’re trying to access the IP address directly without sending Cloudflare the host (which is to be expected).

Alternatively, here’s an example of using curl on command line to verify this. If you try to connect to the Git website using the broken IP, the connection will time out after around 75 seconds.

$ curl -Lvso /dev/null https://git-scm.com --connect-to ::172.67.12.172
* Connecting to hostname: 172.67.12.172
*   Trying 172.67.12.172...
* TCP_NODELAY set
* Connection failed
* connect to 172.67.12.172 port 443 failed: Operation timed out
* Failed to connect to 172.67.12.172 port 443: Operation timed out
* Closing connection 0

If you try that same request using one of the other IP addresses, everything works fine:

$ curl -Lvso /dev/null https://git-scm.com --connect-to ::104.22.2.43
* Connecting to hostname: 104.22.2.43
*   Trying 104.22.2.43...
* TCP_NODELAY set
* Connected to 104.22.2.43 (104.22.2.43) port 443 (#0)
...
< HTTP/2 200
...
* Connection #0 to host 104.22.2.43 left intact
* Closing connection 0

Can anyone in Utah (or anywhere really) confirm whether they’re seeing similar results? Or does anyone have advice on what I can try next?

I’ve contacted my ISP and they are seeing the issue as well, so they said they’d try to get in touch with Cloudflare to resolve the issue. They’re a smaller ISP though, so they told me they think I have a better chance of getting it fixed since I use Cloudflare for my own websites (some of which are also affected).

After several days of persistence (right as I was about to give up), I was able to convince Cloudflare Support to escalate this to their network engineers. Cloudflare then confirmed the issue was on their side and the issue was resolved within a few hours overnight.

With that said, I’d be interested to know what caused this to happen and what the best way to resolve it would have been. It seems odd that a large number of Cloudflare hosted websites can be almost completely unusable for a specific geographic region for days at a time without some type of monitoring systems detecting an issue. I also find it weird that it didn’t get added as an incident on the status page once it was identified as an issue, especially since the issue persisted for several days, caused what I’d consider to be pretty severe connectivity and reliability issues with various Cloudflare-hosted websites, and affected users from several ISPs.

In any case, I’m glad it’s fixed now and am happy with how quickly the issue was fixed once it made it into the network engineering team’s hands.

I faced the same issue few days back when I tried to Connect to BOM.
My website connect to 3 IPs as well
104.26.x.x
104.26.x.x
172.67.x.x

and 172.67.x.x was set as first record and was giving request timed out. other two are working but my website at that time was resolving to 172.67.x.x hence it was down. I generated a ticket #1956622. They just didn’t given it any attention.

Definitely not normal that an IP is unreachable like this, the way this particular failure happened is extremely rare (I have never seen it before in all my time here), so the Support team weren’t able to replicate the problem with the data or tools we had. Once we were able to correlate a second report from another customer we escalated and the Networking team found & fixed the issue. We’ll look at this internally to see what we could do better next time.

If you do see this again, definitely continue to contact support about it - it useful to provide the following:

  1. The source IP address you see the issue from
  2. A traceroute to the failing IP e.g. traceroute 1.2.3.4
  3. A traceroute to a working IP e.g. traceroute 4.5.6.7
  4. A packet capture of the connection attempt if you can

If you contact support with this information we will work with the Network team to understand what might be happening.

4 Likes

@simon so my issue was also fixed?

@user3011 are you still seeing an issue connecting to that IP? If so please do contact support with the information I mentioned above.

NO, I am not. But i can’t be sure because I run an international website. And I do not have access to all ISP. I m just asking if this issue is resolved for only ryanp or for everyone including my website.

The original report was about users in Salt Lake City, US and you are referring to India, so I don’t think the two are related.

We are not aware of any issues in India like this and have had no other reports of this - so the only thing to do would be to gather the information I mentioned from an impacted user where the issue is ongoing.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.