WIth multiple unbound instances, different 1.6.X versions i have a problem, only when using dns over tls with 1.1.1.1 and 1.0.0.1. I have similar issues with adguard, but i didn’t try to get it reproducible. but it is gone now i switched to 8.8.8.8 in adguard
When resolving apps.mypurecloud.ie. after a cold start it always gives a SERVFAIL, after a while it starts resolving and then it will be intermittent resolving and not resolving
Things which will fix the issue:
- using dns over tls with google
- using plain dns with cloudflare
While debugging i saw the following symptoms while using cloudflare dns over tls:
-
.ie
root servers where not resolvable
- intermittent different results which made it hard to debug
1 Like
Thanks for the report! We’re investigating.
just as additional information, i can’t reproduce it anymore today on the same machine with the same settings as yesterday for the domain mentioned, except i can no reproduce
after i found some information about failure scenarios icm with forwarding and dnssec resolving i wanted to test this.
except now the domain dealerdirect.eu doesn’t resolve anymore on that resolver while dns over tls is used. Which might be interesting because cloudflare is the authoritative server in this case
edit
and dealerdirect.eu start working again with no changes on this side
Hey, I just wanted to confirm that I’m seeing the exact same problem. I.e. Unbound + Cloudflare over TLS = intermittent SERVFAIL. I’m using Unbound 1.10.0.
I’ve done a little further digging: the errors seem to be caused by DNSSEC failures. They go away if I disable DNSSEC support in Unbound. I think this started happening about a week ago for me.
For me, the SERVFAIL problem starts happening randomly for a single domain (at a time). The problem seems to go away after 15 minutes from when it first starts occurring. Speculating, this may be because I have the TTL for Host cache entries in Unbound set to 15 minutes. If I’m right, this would imply anyone running a caching DNS over Cloudflare is more likely to run into this error, because the problem is “sticky” for a short period of time, as opposed to immediately going away with a refresh when the response isn’t cached.
I’ve seen this problem for many domains, including popular ones like en.wikipedia.org, so I’m fairly confident this is not a DNS configuration problem with these sites.
Happy to provide any more information I can. Thanks!
Thanks for reporting! Could you dig @1.1.1.1 ch txt id.server
and report the colo where you see the problem?
With the caveat that I haven’t yet seen this problem today, I get the response “SAN” from that request currently. I will edit this if I see the problem again today to make a note of it and see if I get a different colo for any reason at that time.
For the record, I have both 1.1.1.1 and 1.0.0.1 set as forward-addrs in Unbound, but I get the same colo for both of them.
6 hrs later: edit to add that I have not yet seen any SERVFAIL responses today.
Thanks, we’ve noticed a few faulty instances earlier. It should not be recurring, but I’ll keep an eye on it.
1 Like
i haven’t seen any errors so i can’t do this know, but when it happens again i will report this
Got a SERVFAIL just now for blog.rubenwardy.com
. (Not my site, just happened to hit it in a search result.)
Edit: after some digging through the logs, I’m now thinking this specific SERVFAIL problem is pretty clearly a dupe of this: Fail to resolve wildcard domains when using DNS over TLS. This problem I described in this comment isn’t intermittent, as I originally thought, and the logs and subdomain issue match the other report. (I’ve removed several other edits from this post, as they proved to be a rabbit trail.)