2019/9/19 Workers Runtime Release Notes -- Concurrent Subrequest Limit

The code that broke was:

let fetches = []
for (source of sourceConfig) {

let jsonReads = []
for (response of await Promise.all(fetches)) {

const jsons = await Promise.all(jsonReads)

This triggered the deadlock detection and cancelled one of the requests, breaking the script. In retrospect this wasn’t a great way to implement my logic—why wait for all responses before processing the bodies—but it was the simplest approach I could think of at the time :slight_smile:

Hmm, that would indeed break, but we should have detected and grandfathered your worker, if it ran at all in the two weeks before the change was made. Could you tell me what zone (domain) this is on, so I can investigate?

Hi @KentonVarda ,

Can you explain a bit more why this example would break? From your explanation of the new behavior, the seventh fetch should just be delayed, so await Promise.all(fetches) would take a little longer, but it shouldn’t break should it?

I feel like a lot of ppl will be limited if they can’t make more than 6 simultaneous requests without throwing an error

@michael.hart.au The problem with @marko’s example is that it waits for all of the Response objects to arrive (i.e., receives the headers of all responses) before it attempts to read any of their bodies. This requires more than 6 simultaneous HTTP/1.1 connections.

If the code is refactored slightly so that it reads each response body immediately upon receiving that response (independent of any other responses), then it will work fine. Like this:

async function fetchJson(req) {
  let response = await fetch(req)
  if (!response.ok) {
    throw new Error("HTTP error: " + response.status)
  return await response.json()

let jsonReads = []
for (source of sourceConfig) {

const jsons = await Promise.all(jsonReads)

No, but using HTTP/2 would not necessarily solve the problem anyway. Each request is still a logically separate interaction that requires resources to be allocated regardless of whether it is multiplexed over a single TCP stream. Moreover, it would be difficult to tell from the Workers runtime which outgoing requests will eventually be candidates for multiplexing by the egress proxy – these are very different layers in our tech stack.

1 Like

Yeah, I figured as much – my point was more that the new behavior can cause errors, which wasn’t how it was originally announced.

FWIW I think it’s a pity it’s being limited in this way – it wasn’t so long ago that Workers were going to be “the future of Serverless and cloud computing in general” :wink:

My bad, I just re-read the announcement and you do describe the deadlock exceptions – so it was announced this way, I just glossed over it.

As I mentioned, we have yet to see or imagine any real-world use case where this turns out to be a real limiter. If you have one I’d be interested to hear about it. Keep in mind that this limit is per incoming request, not per worker; you can still have a worker handling millions of concurrent requests just fine.

There are a number of use-cases I can give from our current Node.js GraphQL backend – basically whenever you have parallel resolvers that you can’t batch – either they’re fetching from different backends, or the backend doesn’t support batching. Say you want to fetch 10 items by id in parallel.

My understanding is that if you wanted to run a GraphQL server like this in Workers it would have to fetch the first 6, and then the next 4, before it could respond.

There are a number of more sophisticated use cases I can think of as well – for example we use a fan-out in Lambda right now, with one Lambda invoking 80 more in parallel. This would be quite a bit slower if it could only be done in batches of 6. Any scatter-gather / map-reduce use case like this would suffer.

1 Like

The zone is richie.fi – I can give you more details (such as the actual script name) over email.

@marko Hmm, it appears that richie.fi was not among the zones that we detected would be affected by the change. Is it possible that the worker in question didn’t run (or, didn’t attempt to make more than six concurrent connections) at all between 9/2 and 9/15? Otherwise I’m confused why we didn’t detect it. :confused:

No chance of that, it’s been run thousands of times daily with seven items. However, it is an old script—last deployed in December 2018.

Ouch, I guess our detection code must have missed something. I’m really sorry about that. :frowning:

What we did is wrote the limiter code, but instead of having it enforce the limit, it merely logged whenever it would have kicked in. We then collected a list of all the zones that caused such logs. But I guess somehow it wasn’t logging in exactly the same circumstances where we later ended up enforcing things. Looking at the code I can’t really see how this is possible but clearly your experience demonstrates that it is…

I guess by now you’ve changed your code so there’s no useful action for us to take.

So it turns out this came down to an off-by-one error… it looks like the relevant place in the logging code used > when it should have used >= and the effect was that we logged only if you made 8 or more concurrent connections. Your worker managed to make exactly 7 and never any more… (Whereas the other ~20 workers were mostly making dozens of requests, or a variable number.)

They say there are two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.


@KentonVarda one use case for our REST API is as follows.

A single client API will make multiple requests to render our UI - we know this, because there are many atomic calls to fetch each bit.

At present, a Worker is spinning up for both each specific API request to the origin AND each request from visitor to Cloudflare, I imagine.

Could a Worker PoP/node contact the origin over HTTP/2 and ensure that over a certain period e.g. 60 seconds or whatever multiplexing lets you set a given origin’s connection as “open” i.e. a timeout?

Otherwise any Worker that has to fetch something from an origin will do it individually, over and over again.

I hope this example makes sense.

Hi @tallyfy,

Hmm, sorry, I don’t understand what you’re asking.

Not necessarily. It could be the same worker handling all these requests. One worker instance can handle multiple concurrent requests. Note that the concurrent subrequest limit applies to a request context, not to a worker instance. You can have six concurrent subrequests active on behalf of each incoming request, so if a single worker is handling multiple requests, it could have more than six concurrent subrequests in total.

Using HTTP/2 to talk to origin is a question for a different part of the Cloudflare stack. If and when Cloudflare adds support for this, it will apply to workers and non-workers requests alike. It will not affect the concurrent subrequest limit, though.

I don’t understand what you mean here.

@KentonVarda thanks for the response.

Here’s the two connection areas for us

[visitor] <> Cloudflare <> [origin]

I understand http2 is used between visitor and Cloudflare.

However, if a worker handles all requests between the visitor and Cloudflare, then how is http2 done between Cloudflare and origin?

If the same visitor makes 8 requests to the same domain (one every half a second) - all handled by a worker. Is that 8 separate requests dispatched by Cloudflare, across 8 separate non-http2 connections - to the origin?

To me - if that is true, it brings down the value of uncachable requests arriving via http2 between [visitor] and Cloudflare

Hi @tallyfy,

So, first, note that this has nothing to do with Workers – whether you use Workers or not, the way Cloudflare talks to your origin is the same.

I believe Cloudflare currently only uses HTTP/1.1 to talk to your origin, never HTTP/2 (except, I think, for gRPC requests, but I don’t know the details of how that works).

But, note that HTTP/1.1 supports connection reuse! This is often misunderstood. A single HTTP/1.1 connection can be used for many requests, but only one request at a time. HTTP/2’s benefit is that it can handle multiple requests simultaneously.

Cloudflare tries to reuse connections to your origin, because starting up new connections is slow (requiring TCP handshake, TLS handshake, etc.). If you have a steady stream of traffic to your site, then Cloudflare will reuse a stable set of connections to serve many different users.

Indeed, this is why HTTP/2 to origin is not as important as HTTP/2 to clients. In theory, with HTTP/2, we could make fewer connections, by packing lots of concurrent requests (possibly from many different users) on a single connection. But, because we can already maintain a pool of HTTP/1.1 connections over a long period to handle traffic from many clients, connection setup overhead is already largely mitigated. In fact, as long as the connections are set up in advance, a large number of HTTP/1.1 connections probably performs better than a smaller number of HTTP/2 connections, because it’s easier to load balance a larger number of connections across multiple threads, processes, or machines on your receiving end.

1 Like

@KentonVarda this explains a lot - thank you!

In this case, if HTTP/1.1 is what is used between Cloudflare and origin, and is favorable too, would it be better to have my Worker call our origin over http:// and not https://?

I have a choice of pushing a request to our origin on AWS over http or https - so not sure what to pick if performance was the only consideration.

A sub-question in this context. We are sending each request from our Cloudflare Worker to one of these choices on AWS:

  1. An AWS global accelerator URL - ending with the URL *.awsglobalaccelerator.com
  2. An ALB URL - directly calling the ALB - ending with the URL *.amazonaws.com
  3. A Cloudfront endpoint - ending with the URL *.cloudfront.net

Which one would provide the highest performance when the Worker “hands off” the request to one of (1), (2) or (3) to be served by the origin?

All the above have their own AWS-supplied URL’s - which can be CNAME’s if needed.

No. Because Cloudflare reuses connections, the TLS handshake time is mostly mitigated, so there’s very little performance advantage to using HTTP rather than HTTPS to origin.

Sorry I don’t know anything about those AWS products, so I don’t know what the difference is.