Massive rate-limiting issues with Worker in production

A kind technical support engineer got back to us and mentioned he’s bringing our use case to the Workers team for a review - let’s see :slight_smile:

PS: Implementing a batch collector for stats in a worker is pretty hard it seems, my first approach didn’t work (resulting in exotic errors like Cannot clear a timeout created in a different request context.) :smile:

You should be able to batch stuff, you just won’t get any data until the batch is full. Or you setup something to ping your worker on some interval.

So, clearly the right hand is not talking to the left hand here, and I am very sorry about both the confusion and delay it’s caused. I think we’ve gotten our policy clarified internally now, so you should see the tickets getting resolved with the limits lifted. Let me know if that is not the case.

For the purposes of the rate limit being discussed in this thread, the best course of action is still to get it lifted via Support.

However, if you’re more generally worried about hitting rate limits upstream (on the request path) of the worker, then batching is probably the best way to go, assuming the upstream supports some sort of bulk upload API. It can be a little tricky to get right, since different request contexts are limited to communicating via simple global values (e.g., not promises or streams), and can’t manipulate each others’ timeouts/intervals.

Harris

3 Likes

Alrighty I guess I’ll experiment.

I’m not trying to sound too negative. I appreciate everything you guys are doing! There are all sorts of edge cases you must be dealing with.

But I really don’t see any reason to not just be explicit. People are getting blindsided by this limit. I get what you guys are trying to prevent but there are easier ways to send a bunch of request from lots of IPs. And it wouldn’t be that hard for someone to get around it anyways.

Thanks for getting support on the same page, this will help!

3 Likes

@harris - that’s great news - thanks so much! :tada: Our ticket (1780994) hasn’t received an update yet (no further response, still open) but I’ll give it some more time :slight_smile: I wanna make sure the limits are lifted for our account before pushing the worker to production and popping corks again :sweat_smile:

Yeah, I managed to find a way to batch data but it wasn’t easy :smile:

I ended up implementing the concept of a “torch holder” to make sure there’s always a request who’s responsible to emit data if no further request comes in and before the worker is recycled.

The gist: I use event.waitUntil() on every incoming request, instantiate a TorchHolder and return a promise (which I can resolve from the outside), in addition each TorchHolder has a local expiry timeout of 25s (event.waitUntil() has a 30s max limit). Whenever a new request comes in I check if either the maxDataEntries or maxTimeInterval threshold is reached, if that’s the case I’ll use that requests event to emit data, if it’s not the case the old TorchHolder’s promise is resolved, this request gets a promise (event.waitUntil()) and is the new TorchHolder, who can ensure as a safeguard that data is sent in 25s (at it’s expiry) if no further request/new torch holder comes in.

That sounds quite funky but it works. The only thing I don’t like about it is that as you’ve mentioned I cannot clear the expiry timeout of the TorchHandler anymore. For now I just keep isFinished as a local state in the TorchHandler when I resolve the promise which makes the timeout a no-op. I don’t think these resolving dummy timeouts (1 per incoming request) should add to “CPU time” or decrease performance.

I’ll do some further hardening/testing and might open-source that bit if others might be interested. My worker router was quite well received so that might be a nice addition towards a toolset for building less trivial workers.

Update:

Support confirmed that the limit has been raised (hopefully enough :sweat_smile:) for our domain - thanks again to the community and @harris in particular for participating in this issue :slight_smile:

We plan to do a staged production rollout later today. Will do a final update here once the dust has settled.

1 Like

As promised, a final update: Since the limits have been lifted and we redeployed the worker we had zero issues. Absolutely none. Smooth sailing all around. It’s actually running suspiciously great. :smile:

Thanks again for the kind support.

4 Likes