Returning SERVFAIL for valid domains?

This has been happening for the last few days while using Cloudflare DNS via both DOH and plaintext.

Occassionally a request will return a SERVFAIL for valid domains, and then shortly after return the valid response.

Here’s a tcpdump I caught when the issue occurred trying to browse Reddit:

20:12:57.542083 IP 192.168.10.101.54911 > 192.168.10.108.53: 23282+ A? www.reddit.com. (32)
20:12:57.542689 IP 192.168.10.108.51228 > 1.0.0.1.53: 7745+ A? www.reddit.com. (32)
20:12:57.542810 IP 192.168.10.108.51228 > 1.1.1.1.53: 7745+ A? www.reddit.com. (32)
20:12:57.558291 IP 1.0.0.1.53 > 192.168.10.108.51228: 7745 ServFail 0/0/0 (32)
20:12:57.558402 IP 192.168.10.108.53 > 192.168.10.101.54911: 23282 ServFail 0/0/0 (32)
20:12:57.559800 IP 1.1.1.1.53 > 192.168.10.108.51228: 7745 2/0/0 CNAME reddit.map.fastly.net., A 151.101.85.140 (83)
20:13:02.658087 IP 192.168.10.101.60221 > 192.168.10.108.53: 49983+ A? www.reddit.com. (32)
20:13:02.658690 IP 192.168.10.108.37685 > 1.0.0.1.53: 54042+ A? www.reddit.com. (32)
20:13:02.674480 IP 1.0.0.1.53 > 192.168.10.108.37685: 54042 2/0/0 CNAME reddit.map.fastly.net., A 151.101.85.140 (83)

.101 being client, .108 dnsmasq forwarding to Cloudflare DNS

The ServFail above triggered chrome to show a resolution error, however seconds later it would then load fine.

Using Google and Quad9 works fine, so I’m stumped at what to do.

Small update, seems to have only ever occurred on 1.0.0.1, not 1.1.1.1.

I have removed 1.0.0.1 from the pool and will see if that fixes it.

Any ideas why potentially only that IP is affected?

There shouldn’t be any difference between the addresses. If I read it right, there’s 16ms difference between the query and the SERVFAIL, so it looks like a transient issue. Do you use Cloudflared or dnscrypt-proxy, or just dnsmasq? Which PoP are you hitting? dig +short CHAOS TXT id.server @1.1.1.1

I’ve had the same issues for a couple of days, occasional SERVFAIL replies from 1.0.0.1.

Both 1.1.1.1 and 1.0.0.1 is using the PoP “arn02”.

For a couple of hours I’ve only been using 1.1.1.1 and so far everything is workning as intended, which is a bit odd since there really shouldn’t be any difference, as @mvavrusa said.

I too am connecting in via arn02 PoP @mvavrusa

Having run for the last 4 days using only 1.1.1.1 via DOH with dnscrypt-proxy all has been working well. As weird as it is, 1.0.0.1 was the only one with issues.

Whatever the fault may be, it’s still not resolved.

I’m attaching a failed query to 1.0.0.1 for pool.ntp.org as an example. Note that not all queries fail, successful queries to 1.0.0.1 for pool.ntp.org had been made during the time I captured packets.

Packet was captured April 27th and since the problem remains I’d say it’s still relevant.

Query to 1.0.0.1:


    Frame 7135: 72 bytes on wire (576 bits), 72 bytes captured (576 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Apr 27, 2018 00:59:57.947740000 W. Europe Daylight Time
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1524783597.947740000 seconds
    [Time delta from previous captured frame: 0.000627000 seconds]
    [Time delta from previous displayed frame: 0.000627000 seconds]
    [Time since reference or first frame: 25556.958724000 seconds]
    Frame Number: 7135
    Frame Length: 72 bytes (576 bits)
    Capture Length: 72 bytes (576 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:udp:dns]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]

Internet Protocol Version 4, Src: 10.0.0.3, Dst: 1.0.0.1
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 58
    Identification: 0x3383 (13187)
    Flags: 0x4000, Don't fragment
        0... .... .... .... = Reserved bit: Not set
        .1.. .... .... .... = Don't fragment: Set
        ..0. .... .... .... = More fragments: Not set
        ...0 0000 0000 0000 = Fragment offset: 0
    Time to live: 64
    Protocol: UDP (17)
    Header checksum: 0xfc2c [validation disabled]
    [Header checksum status: Unverified]
    Source: 10.0.0.3
    Destination: 1.0.0.1

User Datagram Protocol, Src Port: 53437, Dst Port: 53
    Source Port: 53437
    Destination Port: 53
    Length: 38
    Checksum: 0x0b3b [unverified]
    [Checksum Status: Unverified]
    [Stream index: 3323]

Domain Name System (query)
    Transaction ID: 0x21de
    Flags: 0x0100 Standard query
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Standard query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data: Unacceptable
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries
        pool.ntp.org: type A, class IN
            Name: pool.ntp.org
            [Name Length: 12]
            [Label Count: 3]
            Type: A (Host Address) (1)
            Class: IN (0x0001)

Reply from 1.0.0.1:


    Frame 7136: 72 bytes on wire (576 bits), 72 bytes captured (576 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Apr 27, 2018 00:59:57.953564000 W. Europe Daylight Time
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1524783597.953564000 seconds
    [Time delta from previous captured frame: 0.005824000 seconds]
    [Time delta from previous displayed frame: 0.005824000 seconds]
    [Time since reference or first frame: 25556.964548000 seconds]
    Frame Number: 7136
    Frame Length: 72 bytes (576 bits)
    Capture Length: 72 bytes (576 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:udp:dns]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]

Internet Protocol Version 4, Src: 1.0.0.1, Dst: 10.0.0.3
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 58
    Identification: 0x95a0 (38304)
    Flags: 0x4000, Don't fragment
        0... .... .... .... = Reserved bit: Not set
        .1.. .... .... .... = Don't fragment: Set
        ..0. .... .... .... = More fragments: Not set
        ...0 0000 0000 0000 = Fragment offset: 0
    Time to live: 57
    Protocol: UDP (17)
    Header checksum: 0xa10f [validation disabled]
    [Header checksum status: Unverified]
    Source: 1.0.0.1
    Destination: 10.0.0.3

User Datagram Protocol, Src Port: 53, Dst Port: 53437
    Source Port: 53
    Destination Port: 53437
    Length: 38
    Checksum: 0xeb7a [unverified]
    [Checksum Status: Unverified]
    [Stream index: 3323]

Domain Name System (response)
    Transaction ID: 0x21de
    Flags: 0x8182 Standard query response, Server failure
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 1... .... = Recursion available: Server can do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0010 = Reply code: Server failure (2)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries
        pool.ntp.org: type A, class IN
            Name: pool.ntp.org
            [Name Length: 12]
            [Label Count: 3]
            Type: A (Host Address) (1)
            Class: IN (0x0001)

Is this still an unsolved issue? I had issues with one of my local banks, but upon closer inspection, I’ve received SERVFAILs

; <<>> DiG 9.10.6 <<>> @1.1.1.1 webapps.stgeorge.com.au
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 56869
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;webapps.stgeorge.com.au.	IN	A

;; Query time: 15 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Jan 01 08:18:15 AEDT 2019
;; MSG SIZE  rcvd: 52

Debug from Cloudflare’s tool: https://Cloudflare-dns.com/help/#eyJpc0NmIjoiWWVzIiwiaXNEb3QiOiJObyIsImlzRG9oIjoiTm8iLCJyZXNvbHZlcklwLTEuMS4xLjEiOiJZZXMiLCJyZXNvbHZlcklwLTEuMC4wLjEiOiJZZXMiLCJyZXNvbHZlcklwLTI2MDY6NDcwMDo0NzAwOjoxMTExIjoiTm8iLCJyZXNvbHZlcklwLTI2MDY6NDcwMDo0NzAwOjoxMDAxIjoiTm8iLCJkYXRhY2VudGVyTG9jYXRpb24iOiJTWUQiLCJpc3BOYW1lIjoiQ2xvdWRmbGFyZSIsImlzcEFzbiI6IjEzMzM1In0=

Just curious, does it work now?

It works for me from the United States.

Even though webapps.stgeorge.com.au is buggy and invalid in multiple ways.

Very odd. No - still broken from what I can see. Other aussies I’ve reached out to are getting similar issues using 1.1.1.1 for any subdomain under stgeorge.com.au. Very odd. Historically it has worked fine.