Heavy traffic loss for Telstra and other Australian ISPs

Telstra (and other ISPs that route via them, like TPG) are experiencing packet loss to 1.1.1.1/1.0.0.1 (though not Cloudflare’s network in general).

It seems to affect all traffic, tcp/53, dns/53, tcp/443.

Whereas previously ICMP echo traffic was successful, it currently drops:

$ ping -c 5 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.

--- 1.1.1.1 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 82ms

DNS queries get sporadically answered but seem to get dropped a lot:

Capturing on 'wlp4s0'
    1 0.000000000 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x734a TXT id.server
    2 0.021415840      1.0.0.1 → 192.168.1.52 DNS 85 Standard query response 0x734a TXT id.server TXT
    3 1.568072052 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xc803 TXT id.server
    4 1.616258222      1.0.0.1 → 192.168.1.52 DNS 85 Standard query response 0xc803 TXT id.server TXT
    5 2.696483072 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xe7d8 TXT id.server
    6 7.696550420 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xe7d8 TXT id.server
    7 12.696814561 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0xe7d8 TXT id.server
    8 24.090338235 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x2409 TXT id.server
    9 29.090169851 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x2409 TXT id.server
  10 34.090287655 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x2409 TXT id.server
  11 44.416306196 192.168.1.52 → 1.0.0.1      DNS 69 Standard query 0x1e42 TXT id.server
  12 44.437647257      1.0.0.1 → 192.168.1.52 DNS 85 Standard query response 0x1e42 TXT id.server TXT

TCP opens are not reliable:

Capturing on 'wlp4s0'
    1 0.000000000 192.168.1.52 → 1.1.1.1      TCP 74 33930 → 443 [SYN, ECN, CWR] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973222339 TSecr=0 WS=128
    2 1.002358946 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33930 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973223342 TSecr=0 WS=128
    3 3.018468085 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33930 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973225358 TSecr=0 WS=128
    4 7.274477410 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33930 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973229614 TSecr=0 WS=128
    5 11.949399427 192.168.1.52 → 1.1.1.1      TCP 74 33932 → 443 [SYN, ECN, CWR] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973234288 TSecr=0 WS=128
    6 12.970466914 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33932 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973235310 TSecr=0 WS=128
    7 14.986534083 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33932 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973237326 TSecr=0 WS=128
    8 19.050477344 192.168.1.52 → 1.1.1.1      TCP 74 [TCP Retransmission] 33932 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973241389 TSecr=0 WS=128
    9 23.927635831 192.168.1.52 → 1.1.1.1      TCP 74 33934 → 443 [SYN, ECN, CWR] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3973246267 TSecr=0 WS=128
  10 23.951389936      1.1.1.1 → 192.168.1.52 TCP 66 443 → 33934 [SYN, ACK, ECN] Seq=0 Ack=1 Win=29200 Len=0 MSS=1412 SACK_PERM=1 WS=1024
  11 23.951478391 192.168.1.52 → 1.1.1.1      TCP 54 33934 → 443 [ACK] Seq=1 Ack=1 Win=29312 Len=0

Traceroute for completeness:

$ mtr -c 5 --report 1.1.1.1
Start: 2018-12-28T10:19:02+1100
HOST: x1                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.1                0.0%     5    1.6   2.5   1.6   4.4   1.1
  2.|-- 10.20.22.151               0.0%     5   23.6  29.7  19.7  49.2  12.7
  3.|-- nme-apt-bur-wgw1-be-10.tp  0.0%     5   25.6  30.9  19.6  44.6  10.5
  4.|-- 203-219-107-206.static.tp  0.0%     5   28.7  30.8  20.4  44.6  11.0
  5.|-- bundle-ether-13.win-edge9  0.0%     5   42.2  35.0  21.0  48.0  11.6
  6.|-- bundle-ether2.lon-edge901  0.0%     5   19.9  29.9  19.8  48.3  13.6
  7.|-- ???                       100.0     5    0.0   0.0   0.0   0.0   0.0

Anecdotally, all my DNS queries take a long time (using dnscrypt + Cloudflare DoH). Connecting via non-Telstra-routed networks does not exhibit the issue.

Two forum posts from TPG and Telstra users making independent reports:

Turns out I’m wrong about this, extremely slow or timed out TCP opens are affecting any Cloudflare-hosted site, not just 1.1.1.1:

$ curl -m 10 -X GET -I https://whirlpool.net.au
curl: (7) Failed to connect to whirlpool.net.au port 443: Connection timed out

Thank you, sorry for the issues. We are aware of the problem, have worked with telstra to isolate. If you’re still seeing issues, please do let support know.