What if Workers go down?

Hello everyone, I’m back here with a question.
I’m currently creating an application where every request has to go through and be successful.

And I just have a couple of questions here.

  1. Cloudflare has great uptime, we all know that. But how great are the chances that Workers go down, and for how long?

  2. If Workers go down, how can I otherwise handle the request? Does it happen automatically? Let’s say I have set api.example.com to 0.0.0.0, and then a worker on api.example.com/v1. Since Cloudflare is the primary DNS resolver, the worker will handle those requests. If Workers fail, will Cloudflare automatically continue the request to go to 0.0.0.0 or will everything just fail? In case everything fails, is there a way to make it go through anyway?

EDIT: I’m also aware that yes, I can submit new requests, but I just want to “optimize” the backend system as much as possible to make sure requests almost never fail.

Workers run on the same infrastructure as the whole proxy platform. If there is an issue, chances are it is not only workers, but the entire PoP, in which case Cloudflare will either not accept the request in the first place or wont be able to forward it to the origin.

Now, if your question is what happens when there is an issue specific to the worker engine and whether requests are still forwarded on even if that failed, I am afraid I am not even sure Cloudflare’s support could answer that straight away but thats probably really something for engineering.

@KentonVarda @harris

1 Like

Alrighty, I appreciate the answer you provided. I was also thinking that if the Workers went down, there would be some major issues with Cloudflare as a platform as well, but I just wanted to clear out any confusion in my mind.

And I guess we’ll have to see what the engineers have to say about whether the requests continue
or not.

@sandro is basically right: Workers is tightly integrated with the rest of the Cloudflare proxy stack, and there’s not really any way it could “go down” without the whole system going down. When Cloudflare receives a request and determines that the request needs to run a worker, that worker runs directly on the same machine.

We actually can’t automatically fall back to your origin server on problems, because some people use Workers to implement security checks. So, falling back would be a security flaw. However, inside your worker, you can write:

event.passThroughOnException();

This tells the system that if anything goes wrong, it is safe to fall back to the origin. This includes if your worker code throws an exception or goes over the CPU limit.

2 Likes

So basically if the worker engine goes down it will bring the rest down with it too? Even for domains which are not using workers?

1 Like

@sandro Well, I’d say more the other way around: The only real way for the Workers Runtime to “go down” is because of some system-wide problem.

The Workers Runtime itself just doesn’t have a lot of ways that it can “go down”. It’s a stateless system that can restart nearly instantaneously, so any time it detects anything wrong with itself, crashes, is using too much memory, or anything, it just restarts itself, and the problem is solved. Usually no one outside Cloudflare even notices that anything happened. In order to cause a persistent problem, something has to be broken elsewhere in the system.

Of course, this is over-simplifying. Real-world outages have many complex causes and the exact impact of any particular outages is usually different from all others. So it’s hard to say anything definitively without focusing on a specific kind of outage. But, in general, the Workers Runtime is pretty resilient.

4 Likes

All right, fair enough, so basically a request has to pass workers in order to be proxied to the origin, right? And should there be any exception within the JavaScript code that can be addressed via event.passThroughOnException().

I guess that answers the original question of @Refactor, doesnt it?

1 Like