I tested the download of an 1GB large asset with a slower, and a faster internet connection.
In theory the outcome (success or not) of putting to R2 should be the same in both cases. Because its independent of the client’s internet connection speed.
However, when I tested both scenarios, the R2 write was successful with the fast connection, and failed with the slower connection. That means that the ctx.waitUntil 30 second timeout was exceeded in case of the slower connection. It was always an R2 put “failure” when the client download took more than 30 sec.
It seems like the R2 put (the reading of that stream) is backpressured to the speed of the slower consumer, namely the client download.
Is this because otherwise the worker would have to enqueue the already read parts from the faster consumer?
Am I missing something? Could someone confirm this or clarify this? Also, could you recommend a working solution for this use-case of downloading larger files?
Therefore, you should not use the built-in tee() to read very large streams in parallel at different speeds. Instead, search for an implementation that fully backpressures to the speed of the slower consumed branch.
The linked documentation states the opposite (namely that the faster is not backpressured, but enqueuing happens for the slower):
A teed stream will partially signal backpressure at the rate of the faster consumer of the two ReadableStream branches, and unread data is enqueued internally on the slower consumed ReadableStream without any limit or backpressure.
My exact problem is that the faster is backpressured to the speed of the slower and thus timeouts.
In our implementation, we have modified the tee() method implementation to avoid this issue.
Each branch maintains it’s own data buffer. But instead of those buffers containing a copy of the data, they contain a collection of refcounted references to the data. The backpressure signaling to the trunk is based on the branch wait the most unconsumed data in its buffer.