SERVFAIL answer (often/repeatedly), domain seems OK

Querying over DoT or plain, A, returns SERVFAIL. The rest of info is probably not useful.

It came from forwarding Knot Resolver (so the query has CD bit); switching to other target servers works and the auths seem OK. A PCAP was provided.

/cc directly @mvavrusa again, I guess?

I’m not able to reproduce it now, but I’ll keep an eye on it if it comes back. It looks like the nameserver was unresponsive briefly earlier.

Hello @mvavrusa.
I’m the original reporter of this issue. Previously I thought it could be a problem of knot resolver on my linux gateway. It doesn’t seems so. Knot Resolver is getting SERVFAIL after TTL is expired, repeated requests on this gateway fail with SERVFAIL as well. BUT, when I try to query from another Windows 10 machine in the same network (same public IP), Cloudflare returns correct response. Following response from gateway suddenly works too! These machines have no shared cache, they use directly, thats strange. Other subdomains of suffer with this too, but only on the linux machine with knot resolver.

Another test case
nslookup -> SERVFAIL repeatedly
nslookup -> NOERROR
next nslookup -> NOERROR, until TTL is expired

pcap provided
another larger pcap

Thanks that’s super helpful! This looks like a bug, I’ll add it to next rollout.