Zero-downtime-failover question

I had a question about the zero-downtime-failover feature included with business plans. We currently have an A record registered with a single IP address. In the DNS UI, this record appears as ‘Proxied’. I would like to add a second IP to the same domain, that points to a duplicate k8s cluster hosting identical content as the original cluster. The intention is that both clusters will serve content simultaneously, and if one goes down, Cloudflare will stop sending requests to that cluster. It sounds like the zero-downtime-failover feature would work for our use case.

However, a colleague had some concerns. While describing the feature, I mentioned the concept of ‘round robin’ DNS (I realize that the implementation is probably not traditional round robin DNS, but it helped in explaining the feature). They found this doc that mentions the drawbacks of using round robin for fail overs: https://www.cloudflare.com/learning/dns/glossary/round-robin-dns/ . Their main concern was about end-users caching 1 of the 2 records, and continuing to make requests to the failed cluster. However, I’m not sure this applies to domains that are ‘Proxied’. When I resolve the domain from my local machine, I get a series of 3 addresses that don’t match the IP that our A record points to. In that case, we shouldn’t have to worry about end-user DNS caching.

Can anyone comment as to whether these are valid concerns?

Exactly - Since CF is proxying your record, the actual DNS records that users get are the same regardless of your dns config.

This failover feature works by first requesting from one of the IP addresses (still determined via round robin I would guess; not sure if this is documented), and if it fails, sending the same request to the other IP address.

Do remember that it only retries for certain status codes - so any other 5xx class error like 500, 502, 504, etc. won’t be retried on the other IP.

Cloudflare currently retries only once for HTTP 521, 522, and 523response codes.

2 Likes

Here’s the blog post on it. We know it’s semi-round-robin, as not everybody hits the same origin when there’s more than one. It’s certainly not load balanced.

Hi sdownes.
That would be quite a strange way to get failover. Dont think this is the best way to achieve this.
I would not do it that way !
From the Docs it looks like that as long your origin is down your request will be still send first to the
not working Origin server and then when it fails it will try the other dns record. Which is quite bad as this will be done nearly on every request and there is also no way to controll which request are served from where which in turn will make your Website slower becouse rrquest would be maked in parallel to both origin servers and if your origin server is on the oposite end of the world then your site will have to wait for this long distance connection.

Not sure why you want use Round Robin DNS to achieve this.

If you want high availbility with failover feature and performance then the CloudFlare Load Balancing service is the Perfect solution for this.

It provides healthy checks to your Origin servers and if one goes down it sends the traffic to your other server while routing the requests to the fastes and nearest server.

See description:

Increase reliability with fast failover

Route your visitors away from unhealthy origins and failover instantly with zero downtime. As soon as an origin or pool goes down, requests proxied through Cloudflare get instantly re-routed — without waiting for TTLs to expire.

here is how to do it:

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.