Strange Error 1101 on the 10'th refresh of the page

I’m trying really hard to use Workers to resize the images using https://github.com/cloudflare/rustwasm-worker-template
I’m fighting with the Error 1101 for days not understanding what is the problem. Try loading:
Sample image. If you refresh the page 10 times you get the Error 1101. Funny enough if you open an incognito browser window, you can load the page again ( maximum 10 times ).
Can somebody help me understand what’s the problem?
I’ve done some load testing via curl requests and all is fine.

Here is the full code of my worker:

let removeHeaders = [
"x-bz-content-sha1",
"x-bz-file-id",
"x-bz-file-name",
"x-bz-info-src_last_modified_millis",
"x-bz-upload-timestamp"
]

async function fetchAndStream(request) {
  // Fetch from origin server.
  let response = await fetch(request)
  let newHdrs = new Headers(response.headers)
	removeHeaders.forEach(function(name){
		newHdrs.delete(name)
	})
  // Create an identity TransformStream (a.k.a. a pipe).
  // The readable side will become our new response body.
  let { readable, writable } = new TransformStream()

  // Start pumping the body. NOTE: No await!
  response.body.pipeTo(writable)

  // ... and deliver our Response while that's running.
  return new Response(readable, {...response, headers: newHdrs})
}
// REQUIRED: When configuring this worker script, in the UI, go to the "resource" tab, create a
//   new WebAssembly module resource, name it "RESIZER_WASM", and upload resizer.wasm.
//   OR, upload via the API (see `upload` in Makefile).

// Instantiate the WebAssembly module with 32MB of memory.
const wasmMemory = new WebAssembly.Memory({initial: 512});
const wasmInstance = new WebAssembly.Instance(
    // RESIZER_WASM is a global variable created through the Resource Bindings UI (or API).
    RESIZER_WASM,

    // This second parameter is the imports object. Our module imports its memory object (so that
    // we can allocate it ourselves), but doesn't require any other imports.
    {env: {memory: wasmMemory}})

// Define some shortcuts.
const resizer = wasmInstance.exports
const memoryBytes = new Uint8Array(wasmMemory.buffer)

// Now we can write our worker script.
addEventListener("fetch", event => {
  event.respondWith(handle(event.request))
});

async function handle(request) {
  // Forward the request to our origin.
  let response = await fetch(request)
  let newHdrs = new Headers(response.headers)
	removeHeaders.forEach(function(name){
		newHdrs.delete(name)
	})
  // Check if the response is an image. If not, we'll just return it.
  let type = response.headers.get("Content-Type") || ""
  if (!type.startsWith("image/")) return response

  // Check if the `width` query parameter was specified in the URL. If not,
  // don't resize -- just return the response directly.
  let width = new URL(request.url).searchParams.get("width")
  if (!width) return response

  // OK, we're going to resize. First, read the image data into memory.
  let bytes = new Uint8Array(await response.arrayBuffer())

  // Call our WebAssembly module's init() function to allocate space for
  // the image.
  let ptr = resizer.init(bytes.length)

  // Copy the image into WebAssembly memory.
  memoryBytes.set(bytes, ptr)

  // Call our WebAssembly module's resize() function to perform the resize.
  let newSize = resizer.resize(bytes.length, parseInt(width))

  if (newSize == 0) {
    // Resizer didn't want to process this image, so just return it. Since
    // we already read the response body, we need to reconstruct the
    // response here from the bytes we read.
    return new Response(bytes, response);
  }

  // Extract the result bytes from WebAssembly memory.
  let resultBytes = memoryBytes.slice(ptr, ptr + newSize)

  // Create a new response with the image bytes. Our resizer module always
  // outputs JPEG regardless of input type, so change the header.
  let newResponse = new Response(resultBytes, {...response, headers: newHdrs})
  newResponse.headers.set("Content-Type", "image/jpeg")

  // Return the response.
  return newResponse
}

[/details]

You’re probably accumulating too much RAM usage, if you run a few requests in the browser it will use the same worker process (this giving a 1101 eventually), but if you do it via curl you’d probably get a new worker more often.

Since it’s a 1101 and not a 1102 (resource exhaustion), I’d assume that when the wasm doesn’t get enough RAM, it will just crash, thus giving a 1101. Have you tried assigning it more RAM?

Keep in mind though, that even when using wasm, workers do not have enough resource to resize larger images - well, maybe 1 or 2 or 10, but when you accumulate requests in production, this becomes more of a problem.

@thomas4 I was looking into this problem for some time, but there is no way to figure out memory consumption / cpu utilization for a worker. The error sometimes is 1101 or 1102. What I don’t understand is why it works again if you open incognito mode? I’m using the same IP so I should hit the same CF center.
I don’t understand the accumulation of memory also. The allocation of memory happens in the global scope so should be shared across requests. I’ve attached the code to the issue above.

This code is not invented by me, is offered as sample code for wasm implementation: https://github.com/cloudflare/cloudflare-workers-wasm-demo , https://developers.cloudflare.com/workers/templates/pages/emscripten/

Somehow I expect this to work out of the box by just following the instructions.

I’ve tried also playing with memory allocation, lowering it const wasmMemory = new WebAssembly.Memory({initial: 64}); or making bigger, and the funny thing is that I still get the error on the 10’th refresh

CF saves a cookies which it returns on every request, but if you instead use curl or incognito, you’ll get a fresh cookie every time. Thus, you’re not seeing the issue because it doesn’t accumulate.

You can add /cdn-cgi/trace at the end of your worker URL:
https://images.vcloud42.com/cdn-cgi/trace
… to see exactly which CF colo you’re hitting and if it’s the same or a new. This will make it easier for you to debug this issue.

In any case, there’s not enough CPU-time for resizing images. (I’ve yet to see anyone succeed)

1 Like

Thank you for the tip @thomas4. It seel I’m getting different fl id. I assume different workers. So then is the same problem as this issue ?

If this is the case, then the whole WASM story is quite thin. I wonder how this can even be used if the only demo that it is offered on Cloudflare Workers is not actually working.

Thank you for the quick reply, but I think some debugging tools around the CPU utilization / memory accumulation need to happen, because we are in the dark here.

1 Like

I agree fully @valentin.vieriu, and I’ll be the first to throw myself at publishing the half-done projects I’ve accumulated during the year that will not work due to too little CPU/RAM, when CF finally add the option for more :wink:

As you can see, it’s on the roadmap and certainly a priority, they do want to compete with Lambdas and this is a very large limiting factor.

@thomas4 can you help out maybe mentioning how do you check for the RAM accumulation? Is there a way to partially debug it? I see no information about that in the /cdn-cgi/trace endpoint

Unfortunately, debugging, measuring and monitoring tools are lacking and something that is also a priority for CF to add, I’d even guess it’s arriving sooner rather than later.

The only thing you can go on is the Ray ID, the one at the top. You can see that this is the same as long as the CF cookie is the same. It doesn’t seem to take into account if the Worker you’re currently hitting, will restart, though. So don’t trust it blindly.

You might notice that Workers are still very much a work in progress.

However, there’s a lot of benefits that others don’t have, especially when it comes to easy of use, performance and reliability. The KV they have built is very impressive when combined with Workers and all for a fraction of the price of others (especially Lambdas).

Though, I’d assume that a lot of the “addons” that we need, are going to be an additional cost increase.

Thank you again for the quick reply. I’ve started working with workers when there were in beta. Really liked the idea of lambda on edge, but then gave up because they felt unfinished.
Came back months later hoping to be more mature, but I’m quite disappointed.
It does not feel like a product out of beta. :pensive:

I’m actually glad to have been part of the journey and staying inside of the limits have made me a better developer in turn (especially security/encryption), but I understand the frustration for a developer just wanting to get things done. At the same time, CF is doing something unique and at global scale, I expected this to take time before it started.

I also use Lambdas, to complement workers - I basically use the tasks to retrieve or process data and then send it to a receiving worker endpoint for customer consumption.

Update: The marketing should really reflect the current state of limitations.

Hi @valentin.vieriu, I think that’s exactly the issue you’re experiencing here: each individual request is using more than 50ms CPU time, and eventually the CPU time limiter kicks in for that worker. The CPU time consumption could theoretically be caused by memory pressure (the GC at work), but in this case I think it’s just the raw image resize computation that is too much. The reason I believe that is because there doesn’t appear to be any memory leak in your worker – no global data structures grow per-request, and WASM just uses the memory you give it.

The earliest description of the problem I can find is Kenton’s, here (last paragraph): Long running WebCrypto API?

While the problem is not directly related to the latest release, we do include a fix in the latest release, but only for the preview service:

Fixed a bug in the preview service where the CPU time limiter was overly lenient for the first several requests handled by a newly-started worker. The same bug actually exists in production as well, but we are much more cautious about fixing it there, since doing so might break live sites. If you find your worker now exceeds CPU time limits in preview, then it is likely exceeding time limits in production as well, but only appearing to work because the limits are too lenient for the first few requests. Such workers will eventually fail in production, too (and always have), so it is best to fix the problem in preview before deploying.

That change should be live now, so if you could try your script again in the preview, our expectation is you’ll see an 1102 error much earlier, which should, in theory, help you optimize the script. The behavior of the deployed script will remain the same for now, however.

I say “in theory help you optimize the script”, because in this case I’m not sure how much more optimal you can make image resizing – it is inherently CPU-intensive, and its time complexity is a function of the input size, which you may or may not control. Metered CPU billing (on our roadmap) will eventually provide an option here; another option you could use today is our Image Resizing feature, though that does require a BIZ zone.

I agree 100%. We’re acutely aware of this, and we’re working on providing those tools.

Harris

2 Likes