Cloudflare Workers VERY slow with moderate sized webassembly bindings

I have a worker (code here) that renders html templates.

Much of the logic is written in rust, this is mainly to allow the templates to be rendered using the tera template rendering library.

Everything seemed to be going well…

However there’s a big problem: running the worker is VERY slow - rough 2.5 to 3s extra time per request. That’s after trying everything to reduce the size of the compiled wasm, before responses where ~5s!

That might not sound like that much but when cloudflare workers advertise adding only a few milliseconds to the response time; 3 seconds is ~1000x slower than expected.

These slow response are when the worker is not “hot” e.g. loaded in memory (I’ve added a header to show when this is the case). When the worker is hot, the response time drops to a more reasonable few 10s of milliseconds. However only around 1 in 20 requests when making requests continuously is in memory - I would estimate you’d need to be making ~100/s to each data centre to have a good chance that most requests hit a hot worker.

What am I doing wrong? How can this be fixed? Is the slowness here in loading the worker code from disk, or running some initialisation code that is run before the worker is executed?

If this is just “how it is” cloudflare worker are effectively useless for running webassembly.

A few things to note:

  • In this PR I’ve tried everything to reduce the size, hence the marginally improve response time discussed above. There’s now no Your built project has grown past the 1MiB size limit..., instead I get Built successfully, built project size is 604 KiB. (though ls -lh worker shows module.wasm is 2.9M - this is also weird)
  • I’ve tried every combination of the following to reduce size, nothing has made a significant difference:
    • opt-level options
    • wasm-opt options
    • other compile time options like lto = true
    • using wee_alloc
    • running wasm-snip on the generated module.wasm
  • The slow response time is not in executing the rust code, or even running await import('../pkg') - I’ve inserted a short circuit here and the response time when hitting the short circuit is basically the same
  • This worker is implemented using type = "webpack" in wrangler.toml with the wasm-pack-plugin webpack plugin to compile the wasm, but that’s not the problem either - I tried type = "rust", see this branch, but the performance is the same

I am running to the same issue. I believe the reason is because on each cold start the WASM needs to get compiled into native code.

The followings links talk more about compiling and caching IndexDB (Deprecated), and implicit caching on V8.

Maybe Workers KV can be utilised to cache the compiled WASM if its under 10 MB (it should only be one read when each worker instance is instantiated so shouldn’t be too expensive), However I didn’t want to try and modify the generated JS for initialising the WASM.

I am hoping that Cloudflare can provide an out of box solution for this.

Hi @nesh, thanks for the reply that’s interesting, particularly the v8 implicit caching link.

Unfortunately that doesn’t work. I tried modifying the generated code to load the raw wasm from somewhere other than the binding object:

  • WebAssembly.instantiateStreaming is not available
  • WebAssembly.instantiate with an array buffer is blocked (I guess for the same reasons eval() is blocked) - you get CompileError: WebAssembly.instantiate(): Wasm code generation disallowed by embedder

I also modified the generated JS of my slow worker and inserted lots of console.log() statements. Unfortunately the slow bit happens before we get to WebAssembly.instantiate or WebAssembly.compile. The slow component (presumably getting the wasm object off disk) happens before any JS is executed.

It looks to me like there’s absolutely no work around possible at the moment - the only way to run wasm is using cloudflare’s binding system and that binding system is jaw achingly slow.

Very sad.

Ah, it could be something that can only be solved by Cloudflare themselves, if we don’t have permission.

I leaned away from it being a loading file issue because the same problem occurs when running it on localhost through wrangler dev. It could be that the logs are out of sync because the initialising code is async?

It could be that the logs are out of sync because the initialising code is async?

I don’t think so because if you return from the worker before calling await import(...) you still get the slow performance.

Sounds like this is unsolvable without changes from CloudFlare, real shame.

Ah I see, hopefully we can get an official answer for this.

Quick update on this:

I asked [email protected] to get someone knowledgable in the matter at CF to answer this question. That was 8 days ago. They’ve relied saying they’re looking into it a number of times, but no answer yet.

I guess that’s an implicit admission that there’s a problem their end…

I’ll update here when I hear anything.

Thanks for the update, hopefully they can solve the issue :pray: