What is the name of the domain?
9gag.com, chat.godotengine.org
What is the issue you’re encountering
Packet loss and TLS large timeouts
What steps have you taken to resolve the issue?
IMPORTANT: I’m writing this as an end-user, not as a Cloudflare customer or site owner.
For the last couple weeks I’ve been browsing some websites where it has become annoying to browse them at all.
Sometimes it works fine, then suddenly it takes 1 minute to load or timeouts.
I’m browsing from Argentina, and this problem happens with two major ISPs (Movistar and Personal) but not with another (Claro).
After some research, what these websites have in common is that their CDNs are all coming from AS13335.
For example 9gag[dot]com and godotengine.org are both living in the same ASN and manifests the same symptoms: website loads fine, I reload and then half of the content in the site takes minutes to load or doesn’t load at all. I refresh again, same problem. I refresh again and suddenly everything loads alright.
I’m going to use 9gag for the examples. But here’s the weird part: The problem appears to be TLS related.
For example here’s ping to 9gag[dot]com (which uses AS13335) a few days ago:
ping 9gag[dot]com
64 bytes from 104.16.103.144 (104.16.103.144): icmp_seq=61 ttl=53 time=27.1 ms
--- 9gag[dot]com ping statistics ---
61 packets transmitted, 50 received, 18.0328% packet loss, time 60618ms
rtt min/avg/max/mdev = 22.427/26.088/54.440/4.475 ms
This is ping to the same website right now as I’m typing:
ping 9gag[dot]com
64 bytes from 104.16.107.144 (104.16.107.144): icmp_seq=60 ttl=59 time=8.57 ms
--- 9gag[dot]com ping statistics ---
60 packets transmitted, 59 received, 1,66667% packet loss, time 59091ms
rtt min/avg/max/mdev = 8.391/8.879/9.732/0.287 ms
Personally I think there should be 0% packet loss. “OK maybe that packet loss is reasonable?” (I get 0% packet loss to other addresses, like 8.8.8.8 or yahoo[dot]com or facebook[dot]com). After all, if the server is busy it may not respond to all of them (though I get 0% packet loss when accessing from an AWS EC2 instance; but on the other hand ping time from that instance to Cloudflare below 1ms).
The TTL looks normal so it’s not like it gets detoured through a lot of nodes. Some packets are simply being dropped.
Here’s multiple traceroute runs:
traceroute 9gag[dot]com
traceroute to 9gag[dot]com (104.16.106.144), 30 hops max, 60 byte packets
1 _gateway (192.168.1.3) 0.317 ms 0.499 ms 0.659 ms
2 200.51.241.1 (200.51.241.1) 3.750 ms 3.805 ms 3.841 ms
3 * 213.140.39.117 (213.140.39.117) 11.255 ms 11.281 ms
4 213.140.39.118 (213.140.39.118) 11.185 ms 11.290 ms *
5 * * *
6 * * *
7 104.16.106.144 (104.16.106.144) 9.049 ms 9.074 ms 9.430 ms
9gag[dot]com
traceroute to 9gag[dot]com (104.16.103.144), 30 hops max, 60 byte packets
1 _gateway (192.168.1.3) 0.331 ms 0.506 ms 0.657 ms
2 200.51.241.1 (200.51.241.1) 3.308 ms 3.360 ms 3.589 ms
3 213.140.39.119 (213.140.39.119) 10.103 ms 10.509 ms *
4 213.140.39.116 (213.140.39.116) 10.444 ms * 213.140.39.118 (213.140.39.118) 10.994 ms
5 * 94.142.103.101 (94.142.103.101) 11.593 ms *
6 104.16.103.144 (104.16.103.144) 11.528 ms 8.511 ms *
traceroute 9gag[dot]com
traceroute to 9gag[dot]com (104.16.105.144), 30 hops max, 60 byte packets
1 _gateway (192.168.1.3) 0.374 ms 0.560 ms 0.722 ms
2 200.51.241.1 (200.51.241.1) 4.153 ms 4.195 ms 4.380 ms
3 213.140.39.117 (213.140.39.117) 21.801 ms 21.834 ms *
4 213.140.39.118 (213.140.39.118) 10.713 ms * *
5 * * *
6 104.16.105.144 (104.16.105.144) 11.660 ms cloudflare-ae70-0-grtbueba1.net.telefonicaglobalsolutions[dot]com (94.142.103.101) 41.449 ms 104.16.105.144 (104.16.105.144) 8.481 ms
Everything looks alright, except perhaps that it seems to be hopping to Telefonica in Spain (that’s quite the detour?).
But the real problem manifests once I try to access the TLS port using tcptraceroute:
sudo tcptraceroute img-9gag-fun.9cache[dot]com 443 -l 50
Running:
traceroute -T -O info -p 443 img-9gag-fun.9cache[dot]com 50
traceroute to img-9gag-fun.9cache[dot]com (104.17.53.70), 30 hops max, 60 byte packets
1 _gateway (192.168.1.3) 0.348 ms 0.495 ms 0.653 ms
2 200.51.241.1 (200.51.241.1) 2.506 ms 2.705 ms 3.042 ms
3 213.140.39.117 (213.140.39.117) 24.969 ms 213.140.39.119 (213.140.39.119) 11.099 ms 11.139 ms
4 * * 213.140.39.118 (213.140.39.118) 11.288 ms
5 5.53.7.242 (5.53.7.242) 11.319 ms cloudflare-ae70-0-grtbueba1.net.telefonicaglobalsolutions[dot]com (94.142.103.101) 11.870 ms *
6 104.17.53.70 (104.17.53.70) <syn,ack> 11.694 ms 1009.146 ms 8.809 ms
WAIT! THERE IT IS!
The server 104.17.53.70 took 11.69ms and 8.81ms for two of those packets (expected) and then a sudden burst to 1009.17 ms.
Not all runs of tcptraceroute manifest this large variance (sometimes I get consistent timing) but I can consistently replicate these findings around 1 out of 3 tries.
Does anyone know WHO to contact to solve this issue?
I want to contact my ISP but on the other hand this issue seems to be Cloudflare specific; and I want to do as much research as possible before my ISP answers back with “It’s Cloudflare’s problem” and Cloudflare answers back with “It’s your ISP’s problem”.
It’s also troubling that two ISPs (major ISPs in my country) are showing these symptoms.
Also I can’t find a way to directly contact Cloudflare since it’s not an issue with my website and not a security issue either.
Any help is appreciated.
Thanks and cheers!