Changes this week:
- Unannounced new feature. (Stay tuned.)
- Enforced new limit on concurrent subrequests (see below).
- Stability improvements.
Concurrent Subrequest Limit
As of this release, we impose a limit on the number of outgoing HTTP requests that a worker can make simultaneously. For each incoming request, a worker can make up to 6 concurrent outgoing fetch() requests.
If a worker’s request handler attempts to call fetch() more than six times (on behalf of a single incoming request) without waiting for previous fetches to complete, then fetches after the sixth will be delayed until previous fetches have finished. A worker is still allowed to make up to 50 total subrequests per incoming request, as before; the new limit is only on how many can execute simultaneously.
Automatic deadlock avoidance
Our implementation automatically detects if delaying a fetch would cause the worker to deadlock, and prevents the deadlock by cancelling the least-recently-used request. For example, imagine a worker that starts 10 requests and waits to receive all the responses without reading the response bodies. A fetch is not considered complete until the response body is fully-consumed (e.g. by calling
response.json(), or by reading from
response.body). Therefore, in this scenario, the first six requests will run and their response objects would be returned, but the remaining four requests would not start until the earlier responses are consumed. If the worker fails to actually read the earlier response bodies and is still waiting for the last four requests, then the Workers Runtime will automatically cancel the first four requests so that the remaining ones can complete. If the worker later goes back and tries to read the response bodies, exceptions will be thrown.
Most Workers are Not Affected
The vast, vast majority of workers make fewer than six outgoing requests per incoming request. Such workers are totally unaffected by this change.
Of workers that do make more than six outgoing requests concurrently for a single incoming request, the vast majority either read the response bodies immediately upon each response returning, or never read the response bodies at all. In either case, these workers will still work fine – although they may be a little slower due to outgoing requests after the sixth being delayed.
A very very small number of deployed workers (about 20 total) make more than 6 requests concurrently, wait for all responses to return, and then go back to read the response bodies later. For all known workers that do this, we have temporarily grandfathered your zone into the old behavior, so that your workers will continue to operate. However, we will be communicating with customers one-by-one to request that you update your code to proactively read request bodies, so that it works correctly under the new limit.
Why did we do this?
Cloudflare communicates with origin servers using HTTP/1.1, not HTTP/2. Under HTTP/1.1, each concurrent request requires a separate connection. So, workers that make many requests concurrently could force the creation of an excessive number of connections to origin servers. In some cases, this caused resource exhaustion problems either at the origin server or within our own stack.
On investigating the use cases for such workers, every case we looked at turned out to be a mistake or otherwise unnecessary. Often, developers were making requests and receiving responses, but they only cared about the response status and headers but not the body. So, they threw away the response objects without reading the body, essentially leaking connections. In some other cases, developers had simply accidentally written code that made excessive requests in a loop for no good reason at all. Both of these cases should now cause no problems under the new behavior.
We chose the limit of 6 concurrent connections based on the fact that Chrome enforces the same limit on web sites in the browser. I personally have never heard of anyone having trouble with Chrome’s limit.