1.1.1.1 Inconsistent Behavior (Invalid Subdomain Caching?)


#1

Background: We run a page that provides some insight for our users into their network environment at https://tenta.com/test

One of the tests we run is a DNS leak test, which informs users about which DNS resolvers are seeing their browsing data and also informs them about what features those DNS resolvers support (TLS, etc).

We’ve noticed that when using 1.1.1.1 these reports fail to load.

Reproduction: We’ve created a reliable reproduction, which can be seen using the following curl commands:

curl -vvv -L --dns-servers 1.1.1.1 https://nstoro.com/api/v1/randomizer

When run using any other public resolver, 9.9.9.9 in this example, it works correctly.

curl -vvv -L --dns-servers 9.9.9.9 https://nstoro.com/api/v1/randomizer

Expected Result: The expected result is a HTTP 302 redirect followed by a JSON object with information about the IP address of the recursive resolver which made the request to our name server as well as information about our name server which answered the request. For example (I’ve elided some of the response data to keep the result smaller),

{
	"status": "OK",
	"type": "TENTA_NSNITCH",
	"data": {
		"ip": "74.63.25.247",
		"ip_family": "v4",
		"net_type": "udp",
		"tls_enabled": false,
		<snip>
	},
	"message": "",
	"code": 200
}

This output is generated when the recursive resolver makes a request to our name server shortly before the HTTP request is made.

Actual Output: The actual output is a 404 error message like:

{
	"status": "ERROR",
	"type": "TENTA_NSNITCH",
	"data": null,
	"message": "Not Found",
	"code": 404
}

This response is indicative of the recursive resolver not making a request to our name server shortly before the HTTP request is made. The described test works correctly with at least Google, Quad9, Tenta DNS, a range of OpenNIC servers and Level 3 as the resolver and only seems to fail with 1.1.1.1.

Note: Some linux systems compile curl without DNS support built in. On such a system, curl can be rebuilt with the --enable-ares flag to enable this command line flag.

Discussion: We believe that CloudFlare is either inferring the IP address of sub-domains without actually looking them up, or else looking them up, but returning a different cached result from a previous lookup of a different sub-domain (pointing to a server with a different IP then the one we returned to the lookup). In either case this is non-standard behavior. Each call to the randomizer API produces a new, random sub-domain, and while collisions are not impossible, there is sufficient entropy that they should be vanishingly rare. As such, each request should produce at least one recursive resolution from a recursive resolver to our name server.

The full source code of our server is available at https://github.com/tenta-browser/tenta-dns


#2

This is reproducible just by running Knot resolver locally.

docker run -Pti --rm cznic/knot-resolver
# (then run verbose(true) on the CLI)

and

$ dig +noall +answer @127.0.0.1 -p 32770 test-1002.nstoro.com
test-1002.nstoro.com.   0       IN      A       147.75.32.83

results in:

$ sudo tcpdump -i wlp4s0  'port 53'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wlp4s0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:37:52.976238 IP 192.168.1.52.41926 > 99.192.182.11.domain: 33417 [1au] A? Test-1002.NsToRO.cOM. (49)
09:37:53.175807 IP 192.168.1.52.56020 > 99.192.182.11.domain: 33417 [1au] A? Test-1002.NsToRO.cOM. (49)
09:37:53.296139 IP 99.192.182.11.domain > 192.168.1.52.41926: 33417*- 1/0/0 A 147.75.32.83 (54)
09:37:53.495942 IP 99.192.182.11.domain > 192.168.1.52.56020: 33417*- 1/0/0 A 147.75.32.83 (54)

yet:

$ curl --resolve test-1002.nstoro.com:443:147.75.32.83 https://test-1002.nstoro.com/api/v1/report
{"status":"ERROR","type":"TENTA_NSNITCH","data":null,"message":"Not Found","code":404}

We can see that the query and response clearly went out to the nameserver that was in the glue record for nstoro.com.

Seems like something may be off with the way your nameservers are delegated, but hard to read your intention.


#3

Does the software properly handle DNS queries that aren’t in lowercase? It responds to them, which is good, but do they get misfiled somehow?

kresd and 1.1.1.1 randomize capitalization in queries to authoritative servers, which is a relatively rare configuration.


#4

Huh, I swear I had tried case randomization and it was working well. Must have accidentally repeated a domain twice.

Case sensitivity is definitely the right answer - if you match the mixed-case of the query in the Host header when you go to check the result, it’s there.

$ dig +short @ns1.nstoro.com miXedCaseABC123.nstoro.com
147.75.89.235

$ curl -i https://miXedCaseABC123.nstoro.com/api/v1/report
HTTP/2 200

$ curl -i https://mixedcaseabc123.nstoro.com/api/v1/report
HTTP/2 404

#5

Correct. The random capitalization is something to look out for (the tool matching the DNS and HTTP request must use case insensitive comparison, as the domain names are case-insensitive). We use something similar on https://1.1.1.1/help

$ curl https://$RANDOM.map.cloudflareresolve.com -s | jq .
{
  "ip": "162.158.252.232",
  "ip_version": 1,
  "protocol": "udp",
  "dnssec": true,
  "edns": 0,
  "client_subnet": -1,
  "isp": {
    "asn": 13335,
    "name": "Cloudflare"
  }
}

#6

First and foremost, thanks to everyone for helping us track this down.

We’ve had a long discussion about this today, and we don’t think this is really in compliance with RFC4343, but we’ve gone ahead and added a fix for this. Nonetheless, there’s something … let’s just say our spidey senses are tingling. We’re not sure how, but the behavior of intentionally randomizing case feels like it’s going to come back and bite somebody in the ass. It feels like it probably creates some basis for some kind of differential attack. We haven’t been able to articulate an actual attack based around this yet, but it wouldn’t pass code review here.

We certainly propose that it shouldn’t be used over an already secure channel, such as TLS or HTTPS.


#7

The resolver randomizes letter case for more entropy against Kaminsky type attacks, see https://tools.ietf.org/html/draft-vixie-dnsext-dns0x20-00 and https://dyn.com/blog/use-of-bit-0x20-in-dns-labels/

Why do you think it’s not in compliance with RFC4343? It just clarifies the case insensitivity with respect to non-ASCII bytes.


#8

I see your draft-vixie-dns0x20, which I might note has expired.

We’ve had RFC4343 up on the wall here since yesterday. One of my colleague thinks that you’re within the letter of the law of section 4, I really don’t think you are. But I supposed reasonable people can differ here. I’m sure that randomizing case isn’t a standard and certainly goes contrary to the spirit of the standard.

In fact, this is exactly the kind of behavior which CloudFlare has rightly railed against in TLSv1.3 about people abusing the as-implemented behavior. If you want it standardized, then standardize it.

In any case, if you really want to improve the security of the resolver <-> authority leg, then start discovering and querying via TLS up to the authorities. Then the fragile nature of UDP responses are irrelevant.


#9

I agree it’s not a panacea. We’re actually testing TLS with authorities, let me know if you’d like to participate ([email protected])