"A hanging Promise was canceled" - Rust (wasm) Worker

I am working on a very simple worker (in Rust) that’s working as expected when requests are sent sequentially, but which starts failing non-deterministically with a 500 response as soon as requests are sent concurrently, with the following error (from the wrangler dev log, but the same error occurs when I use wrangler publish):

HTTP/1.1 500 Internal Server Error
A hanging Promise was canceled. This happens when the worker runtime is waiting for a Promise from JavaScript to resolve, but has detected that the Promise cannot possibly ever resolve because all code and events related to the Promise's request context have already finished.
Uncaught (in response)
Error: The script will never generate a response.

Any idea how to fix this? I am pretty sure that the promise is not in fact a “hanging” promise, because my Rust tests (which do little more than call the worker and check the response) pass whenever I run them sequentially using cargo test -- --test-threads=1, but start failing once I run the tests concurrently using cargo test. I’m awaiting the JS promise in Rust using a wasm_bindgen generated async binding.

Here’s a minimal test case to reproduce the behavior: https://github.com/fkettelhoit/cf-worker-rust-hanging-promise

I just tested it again and even though it occurs much more rarely now the issue still persists. At first I thought that the issue might have been fixed for workers published using wrangler publish (10 parallel requests now usually pass in the test whereas they almost always failed when I first created this topic), but for 50 - 100 parallel requests the updated test still fails. When using wrangler dev the test fails even for a much lower number of parallel requests (immediately when 2 requests are processed in parallel, as far as I can tell).

I have pushed the updated tests to the linked repo.

Has anyone at Cloudflare been able to look at the issue? It still persists for me and is also affecting others:

I am still in the process of evaluating Workers and this is really a blocker, there is no way to run it in production if 2 parallel requests may cause the worker to fail…

1 Like

Your best bet is to catch the Workers Dev Team in their Discord:

I raised the issue on Discord already, but did not receive a reply there either. Discord is probably not an ideal place for such an in-depth discussion anyway, especially for people like me in non-US time zones. I will try it there again though, thanks for the suggestion.

1 Like

I personally don’t know anything about Rust or wasm-bindgen, so I can’t identify the specific issue, but maybe I can help understand the problem.

This message happens when the Workers Runtime determines that the request is not done yet (no response has been returned), but it also isn’t waiting on anything anymore (e.g. there are no outstanding fetch() requests, setTimeout(), etc.), therefore it decides that the request will never complete. The runtime makes this determination on a per-request basis, even if there are multiple requests running in the same isolate.

Now, your code appears to be waiting on a setTimeout(). The runtime should see that and decide that the request is still “doing something” and it should wait.

My guess is that somehow, the requests are interfering with each other deep within the Rust / wasm-bindgen promise state. Somehow, the fact that another request is in-flight is causing one request’s promise not to complete on time. Maybe the bindings are incorrectly waiting for all outstanding promises to complete before allowing any of the waiters to continue?

In any case, this seems like it must be a bug in wasm-bindgen… but I’m afraid I don’t know much about it.

Thanks for the explanation and for taking the time to look at the issue, @KentonVarda!

Has someone else been able to investigate the issue? I can’t exclude the possibility that the bug originates in wasm-bindgen, but it would have to be a bug pretty deep down in some of the core Rust libraries, as wasm-bindgen is about as official as Rust’s wasm tooling gets and is even used in Cloudflare’s own Rust Worker template (GitHub - cloudflare/rustwasm-worker-template: A template for kick starting a Cloudflare worker project using wasm-pack. and https://developers.cloudflare.com/workers/tutorials/hello-world-rust).

In case it is a bug in wasm-bindgen, does that mean that awaiting promises in Rust Workers is not supported? There is not much I can do to debug the issue any further on my side and I think opening an issue in the wasm-bindgen repository is not really an option as I cannot reproduce the issue outside of the Worker runtime, which is of course not open source, so the debugging would need to happen on the Cloudflare side.

At this point I’m not sure how to proceed, as the only option without a fix seems to be a rewrite of the whole Worker in JS. I fully understand that wasm Workers are kind of a second class citizen on the Workers runtime, but I had hoped that awaiting promises from wasm would be supported somehow.

@fkettelhoit - after some investigating, you can accomplish this by updating your code to await the result of wasm_bindgen(wasm) on a per-request basis:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

// instantiate these once, and await the Promise returned by `wasm_bindgen(wasm)` inside the request handler.
const { handle } = wasm_bindgen;
const instance =  wasm_bindgen(wasm);

/**
 * Fetch and log a request
 * @param {Request} request
 */
async function handleRequest(request) {
    await instance; // not sure if there is a more optimal way to check if this is resolved with some conditional?

    const objectWithAsyncMethod = {
      run: () => new Promise(resolve => setTimeout(resolve, 10))
    };

    const greeting = await handle(objectWithAsyncMethod);
    return new Response(greeting, {status: 200});
}

Running your tests with this change handles all the requests and passes the tests.

To clarify, the change here is to run wasm_bindgen once at the global scope, not per-request. But since we don’t currently support top-level await, you do have to await the promise at the start of each request (which will always complete immediately, so no big deal).

The problem previously was that running wasm_bindgen at the start of every request meant that the previous request’s state was being clobbered, which was bad if it was still running.

@nilslice @KentonVarda thanks, it works perfectly now! I have updated the linked repo as well as two PRs to reflect this change, hope this might be useful:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.