Some VMware subdomains not reachable

I ran into this issue as of Monday 17.1.2022 when I could not reach the following VMware sub-domains anymore:
https://kb.vmware.com
https://console.cloud.vmware.com

Thought it might be temporary, but it still won’t work, if I’m using 1.0.0.1 as my DNS server (for some reason my ISP blocks 1.1.1.1 but I assume that 1.0.0.1 behaves in the same way).

3 Likes

I’ll add to this. We use 1.1.1.1 as a forwarder and our sysadmins just informed me they cannot access https://customerconnect.vmware.com/ to download VMware software either. Came to the community and found this thread.

customerconnect.vmware.com returns a “SERVFAIL” response

1 Like

We have a paid version of Cloudflare so I tried to create a ticket for this. The bot closed the ticket because it is related to the resolver and suggested I post to the community instead (which was already done). We can’t be the only ones dealing with this.

A couple of web DNS propagation tools I tested with are also showing Cloudflare as being unable to resolve kb.vmware.com or customerconnect.vmware.com. Additionally, I used a VPN service to test 1.1.1.1 from multiple points around the world with the same result. I’m located on the US east coast. Screenshot of one example showing a failure in Australia. Also tested from points in Japan and the Netherlands. Every other major DNS resolver I test with works fine. This doesn’t appear to be a localized issue.

Have the same issue.

Unable to resolve kb.vmware.com, and unable to resolve blogs.vmware.com. But I’m able to resolve docs.vmware.com

I looked up the name servers for VMware.com to see if the authoritative name servers are providing the same response. It seems that Cloudflare’s DNS resolvers are providing the same response as the name servers controlled by VMware.

As this seems to be an “upstream” issue related to those other name servers I don’t think there is much Cloudflare can do beyond forward you on to VMware or their DNS provider.

Command:
nslookup -type=ns vmware.com

Output:

Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
vmware.com	nameserver = dns1.p05.nsone.net.
vmware.com	nameserver = dns2.p05.nsone.net.
vmware.com	nameserver = dns3.p05.nsone.net.
vmware.com	nameserver = dns4.p05.nsone.net.
vmware.com	nameserver = ns01.vmwdns.com.
vmware.com	nameserver = ns02.vmwdns.com.
vmware.com	nameserver = ns03.vmwdns.com.
vmware.com	nameserver = ns04.vmwdns.com.

When asking those name servers for the IP address of those subdomains I get a SERVFAIL from those authoritative name servers.

nslookup kb.vmware.com dns1.p05.nsone.net

Output:

;; Got SERVFAIL reply from 198.51.44.5, trying next server
Server:		dns1.p05.nsone.net
Address:	2620:4d:4000:6259:7:5:0:1#53

** server can't find kb.vmware.com: SERVFAIL

This issue isn’t happening if we use another DNS provider (e.g 8.8.8.8).

Not sure why other nameservers can resolve but cloudflare isn’t able to.

see output of nslookup below. the ns results from 8.8.8.8 is same as 1.1.1.1 when resolving vmware.com.

but 8.8.8.8 is able to resolve kb.vmware.dom while 1.1.1.1 is not.

C:\Users\YYYY>nslookup kb.vmware.com
Server: one.one.one.one
Address: 1.1.1.1

*** one.one.one.one can’t find kb.vmware.com: Server failed

C:\Users\YYYY>nslookup kb.vmware.com 8.8.8.8
Server: dns.google
Address: 8.8.8.8

Non-authoritative answer:
Name: e751.dscx.akamaiedge.net
Addresses: 2600:1402:1400:387::2ef
2600:1402:1400:384::2ef
23.48.88.28
Aliases: kb.vmware.com
ikb.cdnswitch.vmware.com
s751x.vmware.com.edgekey.net

C:\Users\YYYY>nslookup -type=ns vmware.com 8.8.8.8
Server: dns.google
Address: 8.8.8.8

Non-authoritative answer:
vmware.com nameserver = dns1.p05.nsone.net
vmware.com nameserver = dns2.p05.nsone.net
vmware.com nameserver = dns3.p05.nsone.net
vmware.com nameserver = dns4.p05.nsone.net
vmware.com nameserver = ns01.vmwdns.com
vmware.com nameserver = ns02.vmwdns.com
vmware.com nameserver = ns03.vmwdns.com
vmware.com nameserver = ns04.vmwdns.com

It seems google’s DNS servers cache have not expired for those domains yet. When Google’s DNS servers (8.8.8.8) reach out to the authoritative name servers dns1.p05.nsone.net and ns01.vmwdns.com Google’s DNS servers will also start replying with the SERVFAIL response for VMware.com records.

DNS is a tiered caching system where using a downstream provider like Google or Cloudflare often means you are receiving a cached response.

Looking up the name server records for a domain and querying those name servers directly ensures you have the most up-to-date information from the organization or individual that controls the domain. example: nslookup kb.vmware.com ns01.vmwdns.com

If you look up kb.vmware.com against the nose.net and vmw2dns.com name servers you will also receive SERVFAIL responses from them. Seems VMware may have accidentally (or purposely) removed some of their zones from public DNS.

Have you tried reaching out to the VMware organization to ask about this issue?

2 Likes

Hi,

reproduced this issue, only resolution via Cloudflare DNS affected.
Opened a SR @ VMware to inform them.

2 Likes

I am not seeing the same results as you, but I do follow what you are saying and ultimately it does still seem that the issue lies with VMware.

I switched from nslookup to dig and it hinted at another issue. Running dig @1.1.1.1 kb.vmware.com I get the SERVFAIL response, but it also says "failed to verify ikb.cdnswitch.vmware.com. CNAME". Using dig with any other public resolver does not return the verification issue. I could be wrong here, but it seems that maybe Cloudflare is actually doing some verification steps that other public resolvers are not? Possibly following the CNAMEs all the way down and checking them against the original name servers?

To add to what you originally posted, I was able to resolve nslookup kb.vmware.com. ns01.vmwdns.com with no problems. It doesn’t return a “SERVFAIL”, but instead returns a CNAME of ikb.cdnswitch.vmware.com. Repeating the previous step of resolving this against VMware’s name servers, this returns another CNAME of s751x.vmware.com.edgekey.net and this is where I see a problem. When I resolve that against one of VMware’s name servers, I get a “REFUSED” response.

All of that to say, as you mentioned, this seems to point at some sort of VMware DNS issue. Additionally it seems that Cloudflare is doing something (presumably a good thing) that other DNS resolvers are not. That initially seemed to point to Cloudflare as the source of the problem instead of the actual source.

Thanks for your response and hopefully VMware will get this sorted out quickly!

2 Likes

As always, I’m happy to help and hope that you’re able to find a resolution!

I always refer back to this silly haiku/meme in these scenarios.

2 Likes

better to use other public resolvers, until this issue being resolved on Cloud Flare open resolvers.

using direct resolver of NS1, will not provide answer since record involves CNAME from other DNS service provider, hence it is expected. Any service provider resolvers only answers for auth zones, not for any recursive records.

Hi! Sorry about the issues with some vmware.com subdomains. It seems like in some cases the unsigned child delegation isn’t detected correctly, I added a workaround so it should resolve now while we look into this.

2 Likes

Can confirm that the impacted VMware subdomains are now resolving for us properly.

For sake of clarity (and to learn something), is this technically an issue on Cloudflare’s side and nothing to do with VMware then?

Yes indeed, this was an issue on Cloudflare’s side as it fails to detect some transitions between the DNSSEC signed parent zones and unsigned child zones in some cases.

Apologies if it’s not the same issue, but we were seeing the exact same response, SERVFAIL, at 1.1.1.1 fails to resolve a Google endpoint, RRSIGs are missing etc., with domains belonging to Google Cloud Endpoints. We tried our best to troubleshoot.

The problem with 1.1.1.1 fixed itself eventually, but could you comment whether this is the same problem? This would mean that you fixed something on Cloudflare’s end, the same fix applies for our case as well, and we can find ourselves on solid ground again. We would like to continue using 1.1.1.1 as a dependable resolver.

1 Like

Hi! Sorry, I missed the other thread. I checked with an older revision of the resolver and it’s the same problem as this one with a transition between signed parent and unsigned child under some particular conditions.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.