Returning SERVFAIL for valid domains?


#1

This has been happening for the last few days while using Cloudflare DNS via both DOH and plaintext.

Occassionally a request will return a SERVFAIL for valid domains, and then shortly after return the valid response.

Here’s a tcpdump I caught when the issue occurred trying to browse Reddit:

20:12:57.542083 IP 192.168.10.101.54911 > 192.168.10.108.53: 23282+ A? www.reddit.com. (32)
20:12:57.542689 IP 192.168.10.108.51228 > 1.0.0.1.53: 7745+ A? www.reddit.com. (32)
20:12:57.542810 IP 192.168.10.108.51228 > 1.1.1.1.53: 7745+ A? www.reddit.com. (32)
20:12:57.558291 IP 1.0.0.1.53 > 192.168.10.108.51228: 7745 ServFail 0/0/0 (32)
20:12:57.558402 IP 192.168.10.108.53 > 192.168.10.101.54911: 23282 ServFail 0/0/0 (32)
20:12:57.559800 IP 1.1.1.1.53 > 192.168.10.108.51228: 7745 2/0/0 CNAME reddit.map.fastly.net., A 151.101.85.140 (83)
20:13:02.658087 IP 192.168.10.101.60221 > 192.168.10.108.53: 49983+ A? www.reddit.com. (32)
20:13:02.658690 IP 192.168.10.108.37685 > 1.0.0.1.53: 54042+ A? www.reddit.com. (32)
20:13:02.674480 IP 1.0.0.1.53 > 192.168.10.108.37685: 54042 2/0/0 CNAME reddit.map.fastly.net., A 151.101.85.140 (83)

.101 being client, .108 dnsmasq forwarding to Cloudflare DNS

The ServFail above triggered chrome to show a resolution error, however seconds later it would then load fine.

Using Google and Quad9 works fine, so I’m stumped at what to do.


#2

Small update, seems to have only ever occurred on 1.0.0.1, not 1.1.1.1.

I have removed 1.0.0.1 from the pool and will see if that fixes it.

Any ideas why potentially only that IP is affected?


#3

There shouldn’t be any difference between the addresses. If I read it right, there’s 16ms difference between the query and the SERVFAIL, so it looks like a transient issue. Do you use cloudflared or dnscrypt-proxy, or just dnsmasq? Which PoP are you hitting? dig +short CHAOS TXT id.server @1.1.1.1


#4

I’ve had the same issues for a couple of days, occasional SERVFAIL replies from 1.0.0.1.

Both 1.1.1.1 and 1.0.0.1 is using the PoP “arn02”.

For a couple of hours I’ve only been using 1.1.1.1 and so far everything is workning as intended, which is a bit odd since there really shouldn’t be any difference, as @mvavrusa said.


#5

I too am connecting in via arn02 PoP @mvavrusa

Having run for the last 4 days using only 1.1.1.1 via DOH with dnscrypt-proxy all has been working well. As weird as it is, 1.0.0.1 was the only one with issues.


#6

Whatever the fault may be, it’s still not resolved.

I’m attaching a failed query to 1.0.0.1 for pool.ntp.org as an example. Note that not all queries fail, successful queries to 1.0.0.1 for pool.ntp.org had been made during the time I captured packets.

Packet was captured April 27th and since the problem remains I’d say it’s still relevant.

Query to 1.0.0.1:


    Frame 7135: 72 bytes on wire (576 bits), 72 bytes captured (576 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Apr 27, 2018 00:59:57.947740000 W. Europe Daylight Time
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1524783597.947740000 seconds
    [Time delta from previous captured frame: 0.000627000 seconds]
    [Time delta from previous displayed frame: 0.000627000 seconds]
    [Time since reference or first frame: 25556.958724000 seconds]
    Frame Number: 7135
    Frame Length: 72 bytes (576 bits)
    Capture Length: 72 bytes (576 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:udp:dns]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]

Internet Protocol Version 4, Src: 10.0.0.3, Dst: 1.0.0.1
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 58
    Identification: 0x3383 (13187)
    Flags: 0x4000, Don't fragment
        0... .... .... .... = Reserved bit: Not set
        .1.. .... .... .... = Don't fragment: Set
        ..0. .... .... .... = More fragments: Not set
        ...0 0000 0000 0000 = Fragment offset: 0
    Time to live: 64
    Protocol: UDP (17)
    Header checksum: 0xfc2c [validation disabled]
    [Header checksum status: Unverified]
    Source: 10.0.0.3
    Destination: 1.0.0.1

User Datagram Protocol, Src Port: 53437, Dst Port: 53
    Source Port: 53437
    Destination Port: 53
    Length: 38
    Checksum: 0x0b3b [unverified]
    [Checksum Status: Unverified]
    [Stream index: 3323]

Domain Name System (query)
    Transaction ID: 0x21de
    Flags: 0x0100 Standard query
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Standard query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data: Unacceptable
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries
        pool.ntp.org: type A, class IN
            Name: pool.ntp.org
            [Name Length: 12]
            [Label Count: 3]
            Type: A (Host Address) (1)
            Class: IN (0x0001)

Reply from 1.0.0.1:


    Frame 7136: 72 bytes on wire (576 bits), 72 bytes captured (576 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Apr 27, 2018 00:59:57.953564000 W. Europe Daylight Time
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1524783597.953564000 seconds
    [Time delta from previous captured frame: 0.005824000 seconds]
    [Time delta from previous displayed frame: 0.005824000 seconds]
    [Time since reference or first frame: 25556.964548000 seconds]
    Frame Number: 7136
    Frame Length: 72 bytes (576 bits)
    Capture Length: 72 bytes (576 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:udp:dns]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]

Internet Protocol Version 4, Src: 1.0.0.1, Dst: 10.0.0.3
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 58
    Identification: 0x95a0 (38304)
    Flags: 0x4000, Don't fragment
        0... .... .... .... = Reserved bit: Not set
        .1.. .... .... .... = Don't fragment: Set
        ..0. .... .... .... = More fragments: Not set
        ...0 0000 0000 0000 = Fragment offset: 0
    Time to live: 57
    Protocol: UDP (17)
    Header checksum: 0xa10f [validation disabled]
    [Header checksum status: Unverified]
    Source: 1.0.0.1
    Destination: 10.0.0.3

User Datagram Protocol, Src Port: 53, Dst Port: 53437
    Source Port: 53
    Destination Port: 53437
    Length: 38
    Checksum: 0xeb7a [unverified]
    [Checksum Status: Unverified]
    [Stream index: 3323]

Domain Name System (response)
    Transaction ID: 0x21de
    Flags: 0x8182 Standard query response, Server failure
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 1... .... = Recursion available: Server can do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0010 = Reply code: Server failure (2)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries
        pool.ntp.org: type A, class IN
            Name: pool.ntp.org
            [Name Length: 12]
            [Label Count: 3]
            Type: A (Host Address) (1)
            Class: IN (0x0001)