Immediate `Error: Network connection lost` in module worker after `R2.put`

I’ve set up a worker to download a file from an upstream URL and save it into R2. This works when I test in local mode in wrangler, but when I deploy - I see my ‘download and cache into R2’ operation times out almost immediately every time.

    if (!await env.MY_R2.get(r2Key)) {
      console.log(`Downloading from upstream: ${upstream.href}`);
      ctx.waitUntil(fetch(upstream.href)
        .then((response) => response.blob())
        .then((blob) => env.MY_R2.put(r2Key, blob))
        .then(console.log)
        .catch(console.error));
      // Temporarily return the upstream source while the R2 cache processes.
      return fetch(upstream.href);
    }

I’m trying to let the fetch and cache operation happen in the background, but it seems something isn’t letting it work properly and I’m not sure how to debug this.

Update 1: 2022-12-02 17:06 CDT
After digging a little more, I’m wondering if the failure happens so quickly each time because of an exclusive read lock on the upstream ‘blob’. If the blob is implemented as a wrapper around a ReadableStream - maybe there’s another stream reader holding a read lock and the MY_R2.put is encountering an exception when it tries to read the ‘blob’ I’m uploading. Seems like a long shot / wild guess, though.

I’ve tried downloading the file within the worker before uploading it into R2 and for large files (over 100MB) it complains before downloading it that it would exceed the local space, so it doesn’t even attempt the download. That’s fine (and a separate issue), but for smaller files (30MB) it seems like I can download within the worker in a couple seconds, log a message, then it immediately fails when I attempt to put it in the R2 bucket with the same Network connection lost message.

In the worker docs (in workers/runtime-apis/streams/readablestream#pipetooptions) I see mention of pipe options to preventClose and preventAbort. Could these be what I need to use (somehow)?

Update 2: 2022-12-04 00:30 CDT
I switched my response.blob() to response.body and it just started working. :upside_down_face:

I think earlier - when I was seeing problems with response.body returning encoded chunks of binary - at that point I was also returning the data from the fetch method of a Durable Object class to the global fetch. So maybe the way I wrapped the blob in a Response there was my problem?

I still don’t understand why passing a blob to R2.put immediately fails in my worker - but passing a body works fine (except in miniflare, where either are ok). Do people just not use response.blob() very much in JS-land?

The Worker seems to complicate things - don’t do blob(), just fetch the upstream once, tee the body and serve one of the streams to the user and one to R2.put within waitUntil.

If I try to access the upstream binary through the response.body - won’t it be chunked and encoded through a ReadableStream? I actually started down this path at first and I discovered that the blobs written to R2 were corrupt and nearly twice the size of the original zip files. I discovered the reason seemed to be that the binary was being encoded as it was transferred, and I would have to create a reader from the body and process it myself.

Looking at the MDN web docs example for ‘pumping’ a ReadableStream does not look like a simpler solution to me. The MDN documentation for ‘readable byte streams’ looks like it could provide an efficient way to move the downloaded bytes into R2 - but with potentially 100+ lines of code just to implement it. I’d like to avoid adding more complexity at this point.

What I want to be able to do this:

if (!await env.MY_R2.head(r2Key)) {
  await fetch(upstream.href)
    .then((response) => response.blob())
    .then((blob) => env.MY_R2.put(r2Key, blob))
    .then(console.log)
    .catch(console.error);
}
// Return the blob from R2.
const blob = await env.MY_R2.get(r2Key);
const headers = new Headers();
blob.writeHttpMetadata(headers)
headers.set('etag', blob.httpEtag);
return new Response(blob.body, {headers});

The maddening part is that this code works in wrangler against the local miniflare server - it just fails to work when deployed in Cloudflare. :slightly_frowning_face:

To my eyes, this is either a bug in miniflare by allowing me to do something it shouldn’t - or it’s a bug in Cloudflare by not allowing me to do something that works in miniflare.

Not at all?

export default {
  async fetch(req, env, ctx) {
    await env.R2.put('foo', req.body, { httpMetadata: req.headers });
  }
}