Dropped A Records

I’ve been using Cloudflare as an external DNS provider for a kubernetes cluster. A test application that I threw together has been working for the past week or so and hasn’t stopped at all, but all of a sudden the A record for the domain can no longer resolve despite the proper IP still being in my Cloudflare dashboard. I thought maybe it was some crazy local problem, but checked the records via MX Toolbox and still don’t see the records.

➜  ~ dig NS nanosleep.cloud +short @
➜  ~ dig ping.nanosleep.cloud +short @norm.ns.cloudflare.com
# nothing returned :(

➜  ~ curl -H "Host: ping.nanosleep.cloud" -I -k
HTTP/1.1 200 OK
Server: nginx/1.17.8
Date: Tue, 28 Jul 2020 14:30:00 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 365
Connection: keep-alive
Vary: Accept-Encoding
Last-Modified: Sat, 25 Jul 2020 07:03:49 GMT
Set-Cookie: _session=MTU5NTk0NjYwMHxEdi1CQkFFQ180SUFBUkFCRUFBQUtQLUNBQUVHYzNSeWFXNW5EQTBBQzE5ZmJHRnpkRjkxYzJWa0JXbHVkRFkwQkFZQV9MNUFiTkE9fEdrwpBMsELV5WAYHO-Dk0JUJswK80_38fxq3gvspuPi; Path=/; HttpOnly; Secure; SameSite=Lax
Strict-Transport-Security: max-age=15724800; includeSubDomains

Not sure exactly what’s going on, but any help would be appreciated.

So these records worked for the past week and stopped working without any changes to them?

Have you already tried dropping them and re-creating them. In any case, I’d open a support ticket too.

I just found the Cloudflare audit log, it looks like the external DNS Cloudflare provider might be re-creating the records every minute or so and like I may have hit some throttling limit in Cloudflare. Found this related issue in Github: https://github.com/kubernetes-sigs/external-dns/issues/992

I’ll try upgrading the provider to make the delete/adds go away, but wondering if there are any places where either throttling or API limitations are documented for Cloudflare DNS operations–also is there any way of seeing in the UI whether or not I’m running up against whatever API limits?

There is no external DNS provider. If you have something use the API, you better stop that.

I believe the API limit is 1200 requests per five minutes, however if you continuously update these records I could imagine you running into such issues.

1 Like

So, as I mentioned, I am using the kubernetes external-dns project that I linked to previously with Cloudflare as the DNS provider. So, yes, there is an external DNS provider, it’s not an official Cloudflare product, but it uses the Cloudflare API to create/update DNS records based off of annotations on a kubernetes Ingress.

In looking deeper into the audit log, I don’t believe this is hitting the API limits, as the records were getting recreated every minute, so ~10 API requests every 5 minutes.

After digging in more, I think what was happening is that due to the fact that the older version of the external-dns was doing a delete and than an add on a regular interval, one of Cloudflare’s caching nameservers ended up caching an NXDOMAIN response while the records were gone–so even though it looks like my domain “never went away” it got cached during the short interval where the record was being recreated.

Since this doesn’t appear to be a Cloudflare issue, but just a caching issue mixed with a bug in a project improperly using Cloudflare’s API I think I’m good.

Well, currently it still does not return anything.

If you are updating/removing in quick succession I could imagine there could be such issues. Only update when you actually need to update. As for now, I’d drop the records, wait half an hour, and re-create them. Should they still not show up, open a support ticket.