Heavy traffic loss for Telstra and other Australian ISPs


#1

Telstra (and other ISPs that route via them, like TPG) are experiencing packet loss to 1.1.1.1/1.0.0.1 (though not Cloudflare’s network in general).

It seems to affect all traffic, tcp/53, dns/53, tcp/443.

Whereas previously ICMP echo traffic was successful, it currently drops:

$ ping -c 5 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.

--- 1.1.1.1 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 82ms

DNS queries get sporadically answered but seem to get dropped a lot:

Capturing on 'wlp4s0'
    1 0.000000000 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x734a TXT id.server
    2 0.021415840      1.0.0.1 → 192.168.1.52 DNS 85 Standard query response 0x734a TXT id.server TXT
    3 1.568072052 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xc803 TXT id.server
    4 1.616258222      1.0.0.1 → 192.168.1.52 DNS 85 Standard query response 0xc803 TXT id.server TXT
    5 2.696483072 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xe7d8 TXT id.server
    6 7.696550420 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xe7d8 TXT id.server
    7 12.696814561 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xe7d8 TXT id.server
    8 24.090338235 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x2409 TXT id.server
    9 29.090169851 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x2409 TXT id.server
  10 34.090287655 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x2409 TXT id.server
  11 44.416306196 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x1e42 TXT id.server
  12 44.437647257      1.0.0.1 → 192.168.1.52 DNS 85 Standard query response 0x1e42 TXT id.server TXT

TCP opens are not reliable:

Capturing on 'wlp4s0'
    1 0.000000000 192.168.1.52 → 1.1.1.1      TCP 74 33930 → 443 [SYN, ECN, CWR] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973222339 TSecr=0 WS=128
    2 1.002358946 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33930 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973223342 TSecr=0 WS=128
    3 3.018468085 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33930 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973225358 TSecr=0 WS=128
    4 7.274477410 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33930 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973229614 TSecr=0 WS=128
    5 11.949399427 192.168.1.52 → 1.1.1.1      TCP 74 33932 → 443 [SYN, ECN, CWR] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973234288 TSecr=0 WS=128
    6 12.970466914 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33932 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973235310 TSecr=0 WS=128
    7 14.986534083 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33932 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973237326 TSecr=0 WS=128
    8 19.050477344 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33932 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973241389 TSecr=0 WS=128
    9 23.927635831 192.168.1.52 → 1.1.1.1      TCP 74 33934 → 443 [SYN, ECN, CWR] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973246267 TSecr=0 WS=128
  10 23.951389936      1.1.1.1 → 192.168.1.52 TCP 66 443 → 33934 [SYN, ACK, ECN] Seq=0 Ack=1 Win=29200 Len=0 MSS=1412 SACK_PERM=1 WS=1024
  11 23.951478391 192.168.1.52 → 1.1.1.1      TCP 54 33934 → 443 [ACK] Seq=1 Ack=1 Win=29312 Len=0

Traceroute for completeness:

$ mtr -c 5 --report 1.1.1.1
Start: 2018-12-28T10:19:02+1100
HOST: x1                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.1                0.0%     5    1.6   2.5   1.6   4.4   1.1
  2.|-- 10.20.22.151               0.0%     5   23.6  29.7  19.7  49.2  12.7
  3.|-- nme-apt-bur-wgw1-be-10.tp  0.0%     5   25.6  30.9  19.6  44.6  10.5
  4.|-- 203-219-107-206.static.tp  0.0%     5   28.7  30.8  20.4  44.6  11.0
  5.|-- bundle-ether-13.win-edge9  0.0%     5   42.2  35.0  21.0  48.0  11.6
  6.|-- bundle-ether2.lon-edge901  0.0%     5   19.9  29.9  19.8  48.3  13.6
  7.|-- ???                       100.0     5    0.0   0.0   0.0   0.0   0.0

Anecdotally, all my DNS queries take a long time (using dnscrypt + Cloudflare DoH). Connecting via non-Telstra-routed networks does not exhibit the issue.

Two forum posts from TPG and Telstra users making independent reports:


#2

Turns out I’m wrong about this, extremely slow or timed out TCP opens are affecting any Cloudflare-hosted site, not just 1.1.1.1:

$ curl -m 10 -X GET -I https://whirlpool.net.au
curl: (7) Failed to connect to whirlpool.net.au port 443: Connection timed out

#3

Thank you, sorry for the issues. We are aware of the problem, have worked with telstra to isolate. If you’re still seeing issues, please do let support know.