Why the faster stream waits the slower one when using the tee() operator to fetch to R2?

I want to fetch an asset into R2 and at the same time return the response to the client. So simultaneously streaming into R2 and to the client too.

Related code fragment:

const originResponse = await fetch(request);

const originResponseBody = originResponse.body!!.tee()

ctx.waitUntil(
    env.BUCKET.put(objectName, originResponseBody[0], {
        httpMetadata: originResponse.headers
    })
)

return new Response(originResponseBody[1], originResponse);

I tested the download of an 1GB large asset with a slower, and a faster internet connection.

In theory the outcome (success or not) of putting to R2 should be the same in both cases. Because its independent of the client’s internet connection speed.

However, when I tested both scenarios, the R2 write was successful with the fast connection, and failed with the slower connection. That means that the ctx.waitUntil 30 second timeout was exceeded in case of the slower connection. It was always an R2 put “failure” when the client download took more than 30 sec.

It seems like the R2 put (the reading of that stream) is backpressured to the speed of the slower consumer, namely the client download.

Is this because otherwise the worker would have to enqueue the already read parts from the faster consumer?

Am I missing something? Could someone confirm this or clarify this? Also, could you recommend a working solution for this use-case of downloading larger files?

That is indeed how ReadableStream works with tee() - ReadableStream.tee() - Web APIs | MDN

Therefore, you should not use the built-in tee() to read very large streams in parallel at different speeds. Instead, search for an implementation that fully backpressures to the speed of the slower consumed branch.

The linked documentation states the opposite (namely that the faster is not backpressured, but enqueuing happens for the slower):

A teed stream will partially signal backpressure at the rate of the faster consumer of the two ReadableStream branches, and unread data is enqueued internally on the slower consumed ReadableStream without any limit or backpressure.

My exact problem is that the faster is backpressured to the speed of the slower and thus timeouts.

Seems entirely possible that the MDN documentation is irrelevant since workerd has their own implementation that deviates slightly from the spec.

1 Like

Thank you!

Indeed.

In our implementation, we have modified the tee() method implementation to avoid this issue.

Each branch maintains it’s own data buffer. But instead of those buffers containing a copy of the data, they contain a collection of refcounted references to the data. The backpressure signaling to the trunk is based on the branch wait the most unconsumed data in its buffer.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.