1.1.1.1 DNS not responding inside WARP from China

Starting a few days ago my DNS queries to 1.1.1.1 inside a WARP tunnel stared to time out (no response from server).

dig reports “timed out” for queries:

ubuntu@ubuntu:~$ dig @1.1.1.1 www.cloudflare.com.

; <<>> DiG 9.16.1-Ubuntu <<>> @1.1.1.1 www.cloudflare.com.
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

ubuntu@ubuntu:~$

tcpdump on the WireGuard interface shows no response from the remote side:

root@ubuntu:~# tcpdump -ni wgcf host 1.1.1.1 and udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wgcf, link-type RAW (Raw IP), capture size 262144 bytes
20:35:15.858886 IP 172.16.0.2.41386 > 1.1.1.1.53: 44197+ [1au] A? www.cloudflare.com. (59)
20:35:20.859508 IP 172.16.0.2.41386 > 1.1.1.1.53: 44197+ [1au] A? www.cloudflare.com. (59)
20:35:25.863512 IP 172.16.0.2.41386 > 1.1.1.1.53: 44197+ [1au] A? www.cloudflare.com. (59)
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
root@ubuntu:~#

However, Google’s public DNS (8888) works:

root@ubuntu:~# tcpdump -ni wgcf host 8.8.8.8 and udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wgcf, link-type RAW (Raw IP), capture size 262144 bytes
20:37:42.187690 IP 172.16.0.2.58804 > 8.8.8.8.53: 41116+ [1au] A? www.cloudflare.com. (59)
20:37:42.281619 IP 8.8.8.8.53 > 172.16.0.2.58804: 41116$ 2/0/1 A 104.16.123.96, A 104.16.124.96 (79)
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
root@ubuntu:~#

In fact, I tried these 4 addresses and they all gave no response:

  • 1.1.1.1
  • 1.0.0.1
  • 2606:4700:4700::1111
  • 2606:4700:4700::1001

Strangely enough, TCP and ICMP are fine. Using mtr, I can reach 1.1.1.1 through the WireGuard interface in just 1 hop. Similarly, other Cloudflare IP addresses are also responding in 1 hop.

Using curl I can reach 1.1.1.1 via HTTPS:

ubuntu@ubuntu:~$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=70.7 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=64 time=71.7 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=64 time=78.0 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=64 time=66.8 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=64 time=63.0 ms
64 bytes from 1.1.1.1: icmp_seq=6 ttl=64 time=63.6 ms
^C
--- 1.1.1.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5001ms
rtt min/avg/max/mdev = 63.002/68.983/78.006/5.175 ms
ubuntu@ubuntu:~$ curl -s https://1.1.1.1/cdn-cgi/trace
fl=23f436
h=1.1.1.1
ip=<redacted>
ts=1640521591.386
visit_scheme=https
uag=curl/7.68.0
colo=HKG
http=http/2
loc=CN
tls=TLSv1.3
sni=off
warp=plus
gateway=off
ubuntu@ubuntu:~$

WireGuard itself is also fine:

ubuntu@ubuntu:~$ sudo wg show wgcf
interface: wgcf
  public key: <redacted>
  private key: (hidden)
  listening port: 3

peer: bmXOC+F1FxEMF9dyiK2H5/1SUtzH0JuVo51h2wPfgyo=
  endpoint: [2606:4700:d0::a29f:c001]:1701
  allowed ips: ::/0, 0.0.0.0/0
  latest handshake: 32 seconds ago
  transfer: 1.11 MiB received, 2.10 MiB sent
  persistent keepalive: every 20 seconds
ubuntu@ubuntu:~$

I am connecting to WARP using standard WireGuard tools (Ubuntu 20.04 LTS, Linux 5.11, stock wireguard.ko kernel module for support). I have two sources connecting to WARP (two discrete machines using two sets of WireGuard config):

  • 2001:da8::/32 (AS23910) to [2606:4700:d0::a29f:c001]:1701
  • 202.111.192.0/19 (AS4809) to 162.159.192.5:2408

I could provide traceroute for the outer link (carrier of the WireGuard tunnel) if needed. It may or may not be necessary because both TCP and ICMP works inside the tunnel.

I also have an alternative location (Hong Kong) connecting to WARP but it’s working fine. Only WARP tunnels from mainland China are experiencing this DNS issue. Strangely, all clients are connecting to Cloudflare’s same HKG POP.

Independent from me, several of my friends have also reported the same problem from their equipments. They’re all connecting from mainland China.

Will Cloudflare investigate this?

1 Like

Looks like no one’s interested. Maybe time for @MoreHelp ?

Hi @ibugone , as you mentioned, TCP is working fine, can you run a TCP DNS query with NSID enabled? dig @1.1.1.1 cloudflare.com +nsid? Also I’m not quite sure if WARP is fully functional in China.

@anb Nope. TCP queries to 1.1.1.1 are also failing:

ubuntu@ubuntu:~$ ip r g 1.1.1.1
1.1.1.1 dev wgcf table special src 172.16.0.2 uid 1000
    cache
ubuntu@ubuntu:~$ dig @1.1.1.1 cloudflare.com. A +nsid

; <<>> DiG 9.16.1-Ubuntu <<>> @1.1.1.1 cloudflare.com. A +nsid
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

9|ubuntu@ubuntu:~$ dig @1.1.1.1 cloudflare.com. A +nsid +tcp
;; Connection to 1.1.1.1#53(1.1.1.1) for cloudflare.com. failed: timed out.
;; Connection to 1.1.1.1#53(1.1.1.1) for cloudflare.com. failed: timed out.

; <<>> DiG 9.16.1-Ubuntu <<>> @1.1.1.1 cloudflare.com. A +nsid +tcp
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

;; Connection to 1.1.1.1#53(1.1.1.1) for cloudflare.com. failed: timed out.
9|ubuntu@ubuntu:~$

tcpdump shows no response:

ubuntu@ubuntu:~$ sudo tcpdump -ni wgcf
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wgcf, link-type RAW (Raw IP), capture size 262144 bytes
19:31:26.249768 IP 172.16.0.2.33348 > 1.1.1.1.53: 24484+ [1au] A? cloudflare.com. (59)
19:31:31.253481 IP 172.16.0.2.33348 > 1.1.1.1.53: 24484+ [1au] A? cloudflare.com. (59)
19:31:36.257490 IP 172.16.0.2.33348 > 1.1.1.1.53: 24484+ [1au] A? cloudflare.com. (59)
19:31:43.382289 IP 172.16.0.2.32965 > 1.1.1.1.53: Flags [S], seq 3226865681, win 64860, options [mss 1380,sackOK,TS val 2233872915 ecr 0,nop,wscale 9], length 0
19:31:44.409477 IP 172.16.0.2.32965 > 1.1.1.1.53: Flags [S], seq 3226865681, win 64860, options [mss 1380,sackOK,TS val 2233873943 ecr 0,nop,wscale 9], length 0
19:31:46.425664 IP 172.16.0.2.32965 > 1.1.1.1.53: Flags [S], seq 3226865681, win 64860, options [mss 1380,sackOK,TS val 2233875959 ecr 0,nop,wscale 9], length 0
19:31:50.585459 IP 172.16.0.2.32965 > 1.1.1.1.53: Flags [S], seq 3226865681, win 64860, options [mss 1380,sackOK,TS val 2233880119 ecr 0,nop,wscale 9], length 0
19:31:53.385662 IP 172.16.0.2.58367 > 1.1.1.1.53: Flags [S], seq 954753115, win 64860, options [mss 1380,sackOK,TS val 2233882919 ecr 0,nop,wscale 9], length 0
19:31:54.397454 IP 172.16.0.2.58367 > 1.1.1.1.53: Flags [S], seq 954753115, win 64860, options [mss 1380,sackOK,TS val 2233883931 ecr 0,nop,wscale 9], length 0
19:31:56.409454 IP 172.16.0.2.58367 > 1.1.1.1.53: Flags [S], seq 954753115, win 64860, options [mss 1380,sackOK,TS val 2233885943 ecr 0,nop,wscale 9], length 0
19:32:00.569462 IP 172.16.0.2.58367 > 1.1.1.1.53: Flags [S], seq 954753115, win 64860, options [mss 1380,sackOK,TS val 2233890103 ecr 0,nop,wscale 9], length 0
19:32:03.385689 IP 172.16.0.2.52067 > 1.1.1.1.53: Flags [S], seq 1438146797, win 64860, options [mss 1380,sackOK,TS val 2233892919 ecr 0,nop,wscale 9], length 0
19:32:04.409451 IP 172.16.0.2.52067 > 1.1.1.1.53: Flags [S], seq 1438146797, win 64860, options [mss 1380,sackOK,TS val 2233893943 ecr 0,nop,wscale 9], length 0
19:32:06.425452 IP 172.16.0.2.52067 > 1.1.1.1.53: Flags [S], seq 1438146797, win 64860, options [mss 1380,sackOK,TS val 2233895959 ecr 0,nop,wscale 9], length 0
19:32:10.553464 IP 172.16.0.2.52067 > 1.1.1.1.53: Flags [S], seq 1438146797, win 64860, options [mss 1380,sackOK,TS val 2233900087 ecr 0,nop,wscale 9], length 0
^C
15 packets captured
15 packets received by filter
0 packets dropped by kernel
ubuntu@ubuntu:~$

Traceroutes to 1.1.1.1 TCP 53 give no response at all. Traceroutes from the WireGuard interface to Cloudflare IPs (both v4 and v6) complete in one single hop.

As far as I’m concerned, WARP is working for me. If I understand correctly, everything inside the interface wgcf is protected by modern cipher suites employed by the WireGuard protocol, and a MITM should not be able to read anything (provided my credentials are protected). It’s also confusing that Google’s 8.8.8.8 through WARP provides correct DNS answers, with either +tcp or +notcp.

I have got the same issue from Belgium (wireguard with warp tunnel on pfsense (also used wgcf)) ,
other servers like 8.8.8.8 will also work for me…
only cloudflare ones are not working over WARP tcp/udp 53 or TLS 853 :confused:
also tested 172.64.36.1 and 172.64.36.2 and IPV6 cloudflare servers , same issue.

I’m not able to reproduce it. Have you tried the official app? 3rd party configurations are not officially supported unfortunately.

It seems like its the expected behaviour that requests to Cloudflare server doesnt go thru the tunnel… can someone confirm ?
so only non-Cloudflare request would go through ? and they would call it DNS over WARP …?
I see quite a lot of reports of this same behaviour…
@kkrum explains this Cloudflare dns data center location LUX - #7 by kkrum
and it seems to also verifies https://1.1.1.1/help results

I really would like to know if its an issue or the expected behaviour :confused:

@thedaveCA also describes the same Protocol option! - #3 by thedaveCA

@anb I didn’t find a way to run some “app” on Linux so I stuck with standard WireGuard implementation that comes with Linux 5.6. The only Linux-y instructions I can find is Set up 1.1.1.1 - Linux, which was not what I was looking for.

@anb Hi, I got some updates. When connecting to Cloudflare HKG POP from China, I always get fl=23f467 or fl=23f436 from GET /cdn-cgi/trace.

However when connecting over IPv4, I get routed to the LAX POP with fl=445f15, and UDP DNS starts working (dig +nsid returns 445m15).

I also noticed that my Hong Kong endpoint connects to Cloudflare HKG fl=23f528 and dig +nsid gives 23m528.

I have a feeling that something’s behaving differently with specific nodes within Cloudflare. I hope the above information is helpful.

Summary

Location Cloudflare POP fl UDP working? dig +nsid
CN 1 (v4) HKG 23f467 No
CN 1 (v6) HKG 23f467 No
CN 2 (v4) LAX 445f15 Yes 445m15
CN 2 (v6) HKG 23f311 No
CN 2 (v6) HKG 23f363 No
CN 2 (v6) HKG 23f436 No
CN 2 (v6) HKG 23f520 No
CN 2 (v6) HKG 23f524 No
CN 3 (v4) LAX 445f116 Yes 445m116
HK 1 (v4) HKG 23f431 Yes 23m431
HK 1 (v4) HKG 23f454 Yes 23m454
HK 1 (v4) HKG 23f523 Yes 23m523
HK 1 (v4) HKG 23f528 Yes 23m528
HK 1 (v6) HKG 23f443 No
HK 1 (v6) HKG 23f496 No
HK 2 (v6) HKG 23f511 No

The timeout to our DNS services in DoW (DNS over WARP Mode) issue should be fixed, or at least are in the process of being fixed. As a workaround you can move to DoT or DoH mode for DNS.

Re inside/outside the tunnel communication. Two notes here:

  1. Our API traffic for things like registration goes outside the tunnel via HTTPS. We do this mainly to ensure we can recover if the tunnel get in to a bad state.

  2. DoT and DoH DNS traffic is always outside the tunnel. The only DNS traffic inside the tunnel is raw naked DNS trafifc when in DoW (DNS over WARP) mode.

1 Like

@kkrum It’s been a good while and I have yet to observe any difference. There’s still no UDP response from Cloudflare IP inside WARP. Should I open a ticket for this?

It has been taking us longer to fix this than expected. We are actually in the process of rolling all clients to use DoH by default. Once the DoW (Raw UDP DNS over WARP) is fixed we may change this default back (or just keep DoH and DoT as the only options, still thinking about it).

Anyone watching this have a scenario where DoW is important now?

@kkrum We’re connecting to Cloudflare WARP using standard WireGuard client (kernel module as in Linux 5.6). Due to compatibility with legacy software, raw UDP DNS is mandatory and the only thing we can do is to route them into the WG interfaces. So our traffic on the Internet is still UDP DNS-over-WARP.

While we have local recursive resolvers serving regular applications, special ones that demand direct queries from external services need to be taken care of. Thus we would still like to query from 1.1.1.1 through WARP so the special applications can maintain their best performance.

@kkrum I’d like to note that not only raw DNS, but also all UDP traffic inside WARP is gone. This includes HTTP/3 (w/ QUIC) and everything over that. Using mtr -bzuP <port> <ip> can verify this.

@kkrum Just coming back to say thanks. I can observe that the issue has been resolved and all UDP traffic through WARP has been working again for a while, including raw UDP DNS queries. I’m not monitoring these things so I don’t know when exactly it was repaired.

Thanks for following up, good to see that this is working as expected!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.