2019/9/19 Workers Runtime Release Notes -- Concurrent Subrequest Limit

Changes this week:

  • Unannounced new feature. (Stay tuned.)
  • Enforced new limit on concurrent subrequests (see below).
  • Stability improvements.

Concurrent Subrequest Limit

As of this release, we impose a limit on the number of outgoing HTTP requests that a worker can make simultaneously. For each incoming request, a worker can make up to 6 concurrent outgoing fetch() requests.

If a worker’s request handler attempts to call fetch() more than six times (on behalf of a single incoming request) without waiting for previous fetches to complete, then fetches after the sixth will be delayed until previous fetches have finished. A worker is still allowed to make up to 50 total subrequests per incoming request, as before; the new limit is only on how many can execute simultaneously.

Automatic deadlock avoidance

Our implementation automatically detects if delaying a fetch would cause the worker to deadlock, and prevents the deadlock by cancelling the least-recently-used request. For example, imagine a worker that starts 10 requests and waits to receive all the responses without reading the response bodies. A fetch is not considered complete until the response body is fully-consumed (e.g. by calling response.text() or response.json(), or by reading from response.body). Therefore, in this scenario, the first six requests will run and their response objects would be returned, but the remaining four requests would not start until the earlier responses are consumed. If the worker fails to actually read the earlier response bodies and is still waiting for the last four requests, then the Workers Runtime will automatically cancel the first four requests so that the remaining ones can complete. If the worker later goes back and tries to read the response bodies, exceptions will be thrown.

Most Workers are Not Affected

The vast, vast majority of workers make fewer than six outgoing requests per incoming request. Such workers are totally unaffected by this change.

Of workers that do make more than six outgoing requests concurrently for a single incoming request, the vast majority either read the response bodies immediately upon each response returning, or never read the response bodies at all. In either case, these workers will still work fine – although they may be a little slower due to outgoing requests after the sixth being delayed.

A very very small number of deployed workers (about 20 total) make more than 6 requests concurrently, wait for all responses to return, and then go back to read the response bodies later. For all known workers that do this, we have temporarily grandfathered your zone into the old behavior, so that your workers will continue to operate. However, we will be communicating with customers one-by-one to request that you update your code to proactively read request bodies, so that it works correctly under the new limit.

Why did we do this?

Cloudflare communicates with origin servers using HTTP/1.1, not HTTP/2. Under HTTP/1.1, each concurrent request requires a separate connection. So, workers that make many requests concurrently could force the creation of an excessive number of connections to origin servers. In some cases, this caused resource exhaustion problems either at the origin server or within our own stack.

On investigating the use cases for such workers, every case we looked at turned out to be a mistake or otherwise unnecessary. Often, developers were making requests and receiving responses, but they only cared about the response status and headers but not the body. So, they threw away the response objects without reading the body, essentially leaking connections. In some other cases, developers had simply accidentally written code that made excessive requests in a loop for no good reason at all. Both of these cases should now cause no problems under the new behavior.

We chose the limit of 6 concurrent connections based on the fact that Chrome enforces the same limit on web sites in the browser. I personally have never heard of anyone having trouble with Chrome’s limit.

8 Likes

Can’t wait for #BirthdayWeek!

1 Like

I figured I jumped the gun a little bit, thinking this limit was per script, that would have been a shocker :wink:

Is there an ETA when fetch()'s will use HTTP/2 ?

A very very small number of deployed workers (about 20 total) make more than 6 requests concurrently, wait for all responses to return, and then go back to read the response bodies later. For all known workers that do this, we have temporarily grandfathered your zone into the old behavior, so that your workers will continue to operate. However, we will be communicating with customers one-by-one to request that you update your code to proactively read request bodies, so that it works correctly under the new limit.

Hey, looks like my script was one of those 20—it was failing in production and so I heard about this from an end user today. Can you check what happened in terms of that communication?

I’m curious, what are you using it for? :slight_smile:

I replaced a reasonably big Java service that was, at its core, simply combining seven different JSON feeds into one combined feed. So I had an array of requests that I fetched all at once and read the bodies for, all at once. Looks like seven feeds was one too many :slight_smile:

1 Like

Sounds like the perfect use-case, too bad about the limit though.
Maybe you can split it up into multiple scripts? The limit is 6-per script/request.

Hi @marko,

I’m sorry to hear we apparently broke you – we tried very hard not to.

For the ~20 zones that we detected would have been broken by our change, we have not yet applied the change at all. If you saw something change, then you aren’t one of those 20.

What kind of breakage are you seeing, exactly? If you requested N (> 6) feeds at once and waited for all the response bodies, then what you should be seeing is, the first six responses come back right away, but the seventh response doesn’t come until at least one of the first six has finished. It would be delayed, but it shouldn’t be broken.

The code that broke was:

let fetches = []
for (source of sourceConfig) {
  fetches.push(fetch(source))
}

let jsonReads = []
for (response of await Promise.all(fetches)) {
  jsonReads.push(response.json())
}

const jsons = await Promise.all(jsonReads)

This triggered the deadlock detection and cancelled one of the requests, breaking the script. In retrospect this wasn’t a great way to implement my logic—why wait for all responses before processing the bodies—but it was the simplest approach I could think of at the time :slight_smile:

Hmm, that would indeed break, but we should have detected and grandfathered your worker, if it ran at all in the two weeks before the change was made. Could you tell me what zone (domain) this is on, so I can investigate?

Hi @KentonVarda ,

Can you explain a bit more why this example would break? From your explanation of the new behavior, the seventh fetch should just be delayed, so await Promise.all(fetches) would take a little longer, but it shouldn’t break should it?

I feel like a lot of ppl will be limited if they can’t make more than 6 simultaneous requests without throwing an error

@michael.hart.au The problem with @marko’s example is that it waits for all of the Response objects to arrive (i.e., receives the headers of all responses) before it attempts to read any of their bodies. This requires more than 6 simultaneous HTTP/1.1 connections.

If the code is refactored slightly so that it reads each response body immediately upon receiving that response (independent of any other responses), then it will work fine. Like this:

async function fetchJson(req) {
  let response = await fetch(req)
  if (!response.ok) {
    throw new Error("HTTP error: " + response.status)
  }
  return await response.json()
}

let jsonReads = []
for (source of sourceConfig) {
  jsonReads.push(fetchJson(source))
}

const jsons = await Promise.all(jsonReads)

No, but using HTTP/2 would not necessarily solve the problem anyway. Each request is still a logically separate interaction that requires resources to be allocated regardless of whether it is multiplexed over a single TCP stream. Moreover, it would be difficult to tell from the Workers runtime which outgoing requests will eventually be candidates for multiplexing by the egress proxy – these are very different layers in our tech stack.

1 Like

Yeah, I figured as much – my point was more that the new behavior can cause errors, which wasn’t how it was originally announced.

FWIW I think it’s a pity it’s being limited in this way – it wasn’t so long ago that Workers were going to be “the future of Serverless and cloud computing in general” :wink:

My bad, I just re-read the announcement and you do describe the deadlock exceptions – so it was announced this way, I just glossed over it.

As I mentioned, we have yet to see or imagine any real-world use case where this turns out to be a real limiter. If you have one I’d be interested to hear about it. Keep in mind that this limit is per incoming request, not per worker; you can still have a worker handling millions of concurrent requests just fine.

There are a number of use-cases I can give from our current Node.js GraphQL backend – basically whenever you have parallel resolvers that you can’t batch – either they’re fetching from different backends, or the backend doesn’t support batching. Say you want to fetch 10 items by id in parallel.

My understanding is that if you wanted to run a GraphQL server like this in Workers it would have to fetch the first 6, and then the next 4, before it could respond.

There are a number of more sophisticated use cases I can think of as well – for example we use a fan-out in Lambda right now, with one Lambda invoking 80 more in parallel. This would be quite a bit slower if it could only be done in batches of 6. Any scatter-gather / map-reduce use case like this would suffer.

1 Like

The zone is richie.fi – I can give you more details (such as the actual script name) over email.

@marko Hmm, it appears that richie.fi was not among the zones that we detected would be affected by the change. Is it possible that the worker in question didn’t run (or, didn’t attempt to make more than six concurrent connections) at all between 9/2 and 9/15? Otherwise I’m confused why we didn’t detect it. :confused: