Issue with DNS Resolution - SERVFAIL Error / No Reachable Authority at delegation

What is the name of the domain?

primafuture.com

Please include test result URL when you create a post in the community forum. Paste the results from → 1.1.1.1 — One of the Internet’s Fastest, Privacy-First DNS Resolver

What is the error message?

22

What is the issue you’re encountering

EDE(22): No Reachable Authority at delegation

We are experiencing an issue when attempting to resolve the domain primafuture.com using Cloudflare’s DNS resolver at 1.1.1.1. The problem occurs with the following command:

curl --header "accept: application/dns-json" "https://1.1.1.1/dns-query?name=primafuture.com"

This returns the following error:

{"Status":2,"TC":false,"RD":true,"RA":true,"AD":false,"CD":false,"Question":[{"name":"primafuture.com","type":1}],"Comment":["EDE(22): No Reachable Authority at delegation primafuture.com."]}

However, upon checking the logs on our bind9 server, it shows that the request to the authoritative DNS server is being processed and approved, as shown below:

17-Sep-2024 07:11:03.660 client @0x7f401d412168 172.68.212.69#30616: UDP request
17-Sep-2024 07:11:03.660 client @0x7f401d412168 172.68.212.69#30616: using view '_default'
17-Sep-2024 07:11:03.660 client @0x7f401d412168 172.68.212.69#30616: request is not signed
17-Sep-2024 07:11:03.660 client @0x7f401d412168 172.68.212.69#30616: recursion not available (allow-recursion did not match)
17-Sep-2024 07:11:03.660 client @0x7f401d412168 172.68.212.69#30616 (primafuture.com): query 'primafuture.com/A/IN' approved
17-Sep-2024 07:11:03.660 client @0x7f401d412168 172.68.212.69#30616 (primafuture.com): set ede: info-code 18 extra-text (null)
17-Sep-2024 07:11:03.660 client @0x7f401d412168 172.68.212.69#30616 (primafuture.com): reset client
17-Sep-2024 07:11:04.160 client @0x7f401d412168 172.68.212.69#30616: UDP request
17-Sep-2024 07:11:04.160 client @0x7f401d412168 172.68.212.69#30616: using view '_default'
17-Sep-2024 07:11:04.160 client @0x7f401d412168 172.68.212.69#30616: request is not signed
17-Sep-2024 07:11:04.160 client @0x7f401d412168 172.68.212.69#30616: recursion not available (allow-recursion did not match)
17-Sep-2024 07:11:04.160 client @0x7f401d412168 172.68.212.69#30616 (primafuture.com): query 'primafuture.com/A/IN' approved
17-Sep-2024 07:11:04.160 client @0x7f401d412168 172.68.212.69#30616 (primafuture.com): set ede: info-code 18 extra-text (null)
17-Sep-2024 07:11:04.160 client @0x7f401d412168 172.68.212.69#30616 (primafuture.com): reset client

When performing the same DNS query from a web-based dig tool, such as digwebinterface com, the DNS resolution is successful:

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.16.tuxcare.els4 <<>> +additional +nsid primafuture.com. @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52183
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; OPT=15: 00 12 ("..")
; NSID: 37 33 35 6d 31 30 31 ("735m101")
;; QUESTION SECTION:
;primafuture.com.		IN	A

;; ANSWER SECTION:
primafuture.com.	3600	IN	A	87.236.194.79

;; Query time: 262 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Sep 17 07:52:07 CEST 2024
;; MSG SIZE  rcvd: 77

However, when running the same query from my local machine using the following command:
dig +additional +nsid primafuture.com. @1.1.1.1

I get a SERVFAIL response:

; <<>> DiG 9.16.1-Ubuntu <<>> +additional +nsid primafuture.com. @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 56085
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; OPT=15: 00 16 61 74 20 64 65 6c 65 67 61 74 69 6f 6e 20 70 72 69 6d 61 66 75 74 75 72 65 2e 63 6f 6d 2e ("..at delegation primafuture.com.")
; NSID: 33 31 6d 36 38 ("31m68")
;; QUESTION SECTION:
;primafuture.com.               IN      A

;; Query time: 2007 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Út zář 17 07:52:36 CEST 2024
;; MSG SIZE  rcvd: 89

The issue will be on the shoulders of the operator of ns1.primawebhosting.cz and ns-1.primawebhosting.cz, due to the “single point of failure” in the DNS operation.

But is the DNS response actually able to reach Cloudflare?

The Internet consist of many networks connected together, or connected together through an intermediate (e.g. one network or another pays a third party for transit, to be able to reach other networks, that they aren’t connecting directly to).

With so many networks out there, and so many intermediates ISP’s, something will be having issues from time to time, even if the issue isn’t on Cloudflare’s end, nor on the end of e.g. the operator of the operator of ns1.primawebhosting.cz and ns-1.primawebhosting.cz.

RFC’s, building the grounds for Internet standards, namely RFC2182, which is also known as BCP (Best Current Practice) #16 from July 1997 dictates the following:

  1. Recommended 3 - 7 name servers.
  2. Name servers should be located carefully, to avoid a single point of failure (e.g. having 7 in one single city isn’t sufficient).
  3. At least one of the total amount of name servers must be topologically separate from the others (e.g. different ISP’s).

But, …

$ dig +noall +answer A ns1.primawebhosting.cz
ns1.primawebhosting.cz. 3600    IN      A       87.236.194.79
$ dig +noall +answer A ns-1.primawebhosting.cz
ns-1.primawebhosting.cz. 3600   IN      A       87.236.194.79

You are not only failing #1, #2 OR #3 above, but ALL OF THEM at the same time.

Due to some intermittent issues, which could very well be at a third party between the two involved networks, you’re here taking the consequence, that is a direct result of the bad decisions, that has been made by the operator of ns1.primawebhosting.cz and ns-1.primawebhosting.cz, by not separating their name servers well enough.

This issue is a perfect example, to why you would need that secondary DNS, at a different ISP, that has nothing to do with the ISP of the other name server(s), … so that the DNS request could be re-tried somewhere else (and hopefully succeed).

2 Likes

Thank you for your response and lessons learned. I know all of this and my goal is not to work around the bug, but to find out what the bug is. And that’s why I gave the example of an unused domain we own, where only 1 DNS server is intentionally set up, so that it would not be a problem to debug what’s going on there and it would be obvious that the problem is in communication with that server.

Why I ask is not how to get around this - host DNS on multiple servers, but the question is where is the problem for this communication: this dns server - cloudflare. Whether indeed cloudflare makes a request but doesn’t get a response or gets a response but the error is on cloudflare’s side.

Because obviously the UDP request goes to the dns server, the server responds, but what happens after that, we have no way of knowing. The only thing we can observe is that from another country, where the request comes from another cloudflare server it passes and the response is as it should be.

Even other DNS resolvers like google dns etc work as they should.

So anyway, thanks again for the recommendations on how to work around the error, but my point is to resolve the error and find out what is causing it.

Cloudflare claims they are not able to reach an authoritative DNS server.

However, in the log that you posted, it appears that Cloudflare (e.g. 172.68.212.69) is apparently able to reach you.

Therefore, I would personally be looking at the routing from your ISP, and out towards e.g. 172.68.212.69, as it is likely your the return route (e.g. the route of the DNS reply, from your server and towards Cloudflare) that is the problem.

I don’t know about “should” though, - in a perfect world, sure, … but do we have a perfect world?

Your network could be up and running well, so could the other party, but again, … you never know with the carriers or intermediates in-between.

Several Internet carriers have been involved peering conflicts, where their users (customers, […]) are unfortunately becoming the ultimate losers.

In several of these peering conflicts, it have meant that customers of the one network has been completely unable to reach the other network.

In other situations, it has been less destructive, as there may have been an alternative route (although, with higher latency, and sometimes travelling multiple continents).

Although it would be good to identify and get a fix pushed, I’m afraid you will be wasting a lot of time on that kind of journey.

You’re welcome, and thanks to you too, for taking this feedback so nicely!

Unless you’re able to guarantee a perfect world, and perfect operations, also across unrelated networks of third parties, I would however say they aren’t just recommendations, but actually mandatory things.

All I had to do was wait and after a few hours it started working again as before :slightly_smiling_face:

I’m just sorry I don’t know what caused the error.

Anyway in prometheus I added monitoring for cloudflare dns to keep track of it.

Thanks anyway :+1:

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.