Sites that are aliased to [*.]ec.azureedge.net cannot be resolved if DNSSEC is on

I’m using systemd-resolved on Arch Linux with DNSSEC set to the default (allow-downgrade) and DNS-over-TLS set to opportunistic. I’ve configured it to use Cloudfare’s 1.1.1.1 DNS.

This is what it looks like without DNSSEC:

dig www.minecraft.net @1.1.1.1

; <<>> DiG 9.14.3 <<>> www.minecraft.net @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37776
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;www.minecraft.net.		IN	A

;; ANSWER SECTION:
www.minecraft.net.	47	IN	CNAME	aemprod.azureedge.net.
aemprod.azureedge.net.	318	IN	CNAME	aemprod.ec.azureedge.net.
aemprod.ec.azureedge.net. 916	IN	CNAME	scdne00a.wpc.9dfdf.nucdn.net.
scdne00a.wpc.9dfdf.nucdn.net. 916 IN	CNAME	sni1gl.wpc.nucdn.net.
sni1gl.wpc.nucdn.net.	916	IN	A	152.199.20.92

;; Query time: 124 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Τετ Ιουν 26 03:26:05 EEST 2019
;; MSG SIZE  rcvd: 183

All good. Now without specifying the resolver with @, thereby using systemd-resolved:

dig www.minecraft.net

; <<>> DiG 9.14.3 <<>> www.minecraft.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 9416
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;www.minecraft.net.		IN	A

;; Query time: 3209 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Τετ Ιουν 26 03:30:05 EEST 2019
;; MSG SIZE  rcvd: 46

And the culprits can be found in journalctl:

systemd-resolved[536]: DNSSEC validation failed for question ec.azureedge.net IN SOA: failed-auxiliary
systemd-resolved[536]: DNSSEC validation failed for question aemprod.ec.azureedge.net IN A: failed-auxiliary
systemd-resolved[536]: DNSSEC validation failed for question aemprod.ec.azureedge.net IN AAAA: failed-auxiliary

If I have systemd-resolved use Google’s DNS, it works fine.
For now my temporary workaround is to map www.minecraft.net to the final IP address in the hosts file.

Any idea what’s causing the DNSSEC verification to fail?

UPDATE: It happens with download.windowsupdate.com as well. Seems like it’s an issue with
[*.]ec.azureedge.net and Cloudflare DNS:

systemd-resolved[540]: DNSSEC validation failed for question ec.azureedge.net IN SOA: failed-auxiliary
systemd-resolved[540]: DNSSEC validation failed for question wu.ec.azureedge.net IN AAAA: failed-auxiliary
systemd-resolved[540]: DNSSEC validation failed for question wu.ec.azureedge.net IN A: failed-auxiliary

Normal resolution: (no DNSSEC)

dig download.windowsupdate.com @1.1.1.1

; <<>> DiG 9.14.3 <<>> download.windowsupdate.com @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53567
;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;download.windowsupdate.com.	IN	A

;; ANSWER SECTION:
download.windowsupdate.com. 2785 IN	CNAME	2-01-3cf7-0009.cdx.cedexis.net.
2-01-3cf7-0009.cdx.cedexis.net.	130 IN	CNAME	wu.azureedge.net.
wu.azureedge.net.	1265	IN	CNAME	wu.ec.azureedge.net.
wu.ec.azureedge.net.	5	IN	CNAME	wu.wpc.apr-52dd2.edgecastdns.net.
wu.wpc.apr-52dd2.edgecastdns.net. 232 IN CNAME	hlb.apr-52dd2-0.edgecastdns.net.
hlb.apr-52dd2-0.edgecastdns.net. 232 IN	CNAME	cs11.wpc.v0cdn.net.
cs11.wpc.v0cdn.net.	2844	IN	A	93.184.221.240

;; Query time: 152 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Τετ Ιουν 26 11:53:31 EEST 2019
;; MSG SIZE  rcvd: 264

I don’t know much about systemd-resolved, but can you figure out what it’s doing and what’s failing with some sort of detailed logging, or tcpdump or Wireshark?

Edit: Also, does it work with Quad9?

It fails with Quad9 as well. It works with Google and it also works with the Cloudfare DNS over HTTPS implementation in Firefox.

Either a DNS-over-TLS or a systemd-resolved thing. Weird. I guess I could also try stubby? What do you think?

Using stubby configured for Cloudflare and checked though 1.1.1.1/help to be working
www.minecraft.net resolves fine.

upstream_recursive_servers:

## Cloudflare 1.1.1.1 and 1.0.0.1
  - address_data: 1.1.1.1
    tls_auth_name: "cloudflare-dns.com"
  - address_data: 1.0.0.1
    tls_auth_name: "cloudflare-dns.com"

## Cloudflare servers
  - address_data: 2606:4700:4700::1111
    tls_auth_name: "cloudflare-dns.com"
  - address_data: 2606:4700:4700::1001
    tls_auth_name: "cloudflare-dns.com"

I’ve seen issues with systemd-resolved and Cloudflare mentioned before, in this reddit post. Seems like it’s the program’s fault. What causes it though…? Time to search.

I guess it’s a “resolved” issue then. (get it?)

I’m wondering if it’s this issue:

ec.azureedge.net DS returns SERVFAIL from most(?) resolvers, but NODATA from 8.8.8.8.

If it is, logs/tcpdump could probably confirm it.

Edit: Or:

I enabled debug logging through the service file and here’s what I get when I run host www.minecraft.net:

https://hastebin.com/vurefimaze.http

I guess it’s that issue, then.

systemd-resolved[19640]: Transaction 8468 for <ec.azureedge.net IN DS> scope dns on wlp3s0/*.
systemd-resolved[19640]: Using feature level TLS+EDNS0+D0 for transaction 8468.
systemd-resolved[19640]: Using DNS server 1.1.1.1 for transaction 8468.
systemd-resolved[19640]: Sending query via TCP since UDP isn't supported.
systemd-resolved[19640]: Using feature level TLS+EDNS0+D0 for transaction 8468.
systemd-resolved[19640]: Processing incoming packet on transaction 8468 (rcode=SERVFAIL).
systemd-resolved[19640]: Server returned error SERVFAIL, retrying transaction with reduced feature level TLS+EDNS0.
systemd-resolved[19640]: Retrying transaction 8468.
systemd-resolved[19640]: Cache miss for ec.azureedge.net IN DS
systemd-resolved[19640]: Transaction 8468 for <ec.azureedge.net IN DS> scope dns on wlp3s0/*.
systemd-resolved[19640]: Using feature level TLS+EDNS0 for transaction 8468.
systemd-resolved[19640]: Sending query via TCP since UDP isn't supported.
systemd-resolved[19640]: Using feature level TLS+EDNS0 for transaction 8468.
systemd-resolved[19640]: Transaction 8468 for <ec.azureedge.net IN DS> on scope dns on wlp3s0/* now complete with <invalid-reply> from none (unsigned).
systemd-resolved[19640]: Auxiliary DNSSEC RR query failed with invalid-reply

So you’re saying it’s the issue you mentioned in your edit?
(https://github.com/systemd/systemd/issues/8897)

Yeah. The circumstances are different, but it’s the same systemd issue – a DS query returning SERVFAIL.

I have trouble following the discussion in the issue. Why does it work in stubby?. I understand that the reason with works on systemd-resolved is that Google DNS doesn’t return SERVFAIL, yes?

Well, it’s a bug in systemd’s DNSSEC validation. Either Stubby doesn’t validate DNSSEC, or it isn’t buggy. :slightly_smiling_face:

I’m not sure why 8.8.8.8 returns NODATA when at least 3 other implementations return SERVFAIL.

So there’s nothing I can do that doesn’t involve replacing systemd-resolved? Oh man…
That or using Google DNS.

You could also probably keep using systemd-resolved but turn off DNSSEC validation.

Indeed, but then is it really the fancy Secure DNS anymore?

I’ll use Cloudflare DoH with dnscrypt-proxy for now. It has a cache setting :slight_smile: