Can’t reliably use Cloudflare’s DNS with the Palo Alto DNS Proxy because resolving some host names will time out.
What steps have you taken to resolve the issue?
I have used dig to confirm that the UDP response from 1.1.1.1 is not a good one for some hosts and dig will successfully retry using TCP. Palo Alto’s DNS Proxy does not have a TCP retry however. As a workaround I am using Google’s 8.8.8.8 DNS to resolve *.awsapps.com for now with the Palo Alto DNS Proxy until this issue is resolved.
What feature, service or problem is this related to?
Nameservers
What are the steps to reproduce the issue?
If I use Cloudflare’s DNS such as 1.1.1.1 for the resolver with Palo Alto’s DNS Proxy then trying to resolve d-9067e46b0f.awsapps.com will time out.
We are using the Palo Alto DNS Proxy as our internal DNS resolver on our LAN. I have a ticket open with Palo Alto regarding this issue, hopefully we can get to the bottom of it… but their DNS Proxy is pretty basic. It doesn’t coordinate with the Palo Alto DHCP server, so that client host names do not automatically get a DNS entry. I’ve had to create DHCP reservations for hosts and then manually create static DNS entries in the DNS Proxy for each host, one by one. There is no indication their DNS Proxy supports DNS-over-TLS, DNS-over-HTTPS, or DNS-over-QUIC. If they do support DNS over TCP, they’re not handling it properly since like I said, when a UDP query doesn’t return the data the Palo Alto DNS Proxy was expecting, the client just ends up with a DNS query timing out.
What is happening is none of those things. A response too large for a single UDP packet is retried by a resolver using TCP. I can’t recall what RFC spells that out, but it’s been a feature of standard DNS for decades.
If it isn’t a configuration issue on the Palo you should literally yeet the thing into the river because it’s far more useful as a boat anchor than as an IT tool.
I didn’t mean to imply that the lack of the other DNS query methods was the source of the problem, I mentioned it simply to indicate the lack of development of the Palo Alto DNS Proxy, as in it feels like an afterthought of a feature. I don’t think the service gets much attention because Palo Alto firewalls are expensive devices which means that a lot of customers who use them have other infrastructure which runs DNS internally so they don’t rely on the Palo Alto DNS Proxy, and people aren’t buying a PA firewall for it’s DNS Proxy abilities. It probably sees more use relaying internal DNS records entirely inside customer environments than it does doing public record lookups for hosts inside those networks (which is where we’re encountering this issue).
Palo Alto support, as expensive as it is, is outsourced to a callcenter in India where the people who respond to tickets went through training courses and don’t have that much real-world experience, so dealing with them is frustrating when it comes to bugs like this in their feature set.
Is there a way to figure out why Cloudflare’s DNS returns more data for that lookup than Google’s DNS, other than Wireshark?