In general this problem is solved via an APM solution that profiles each function and shows what takes the longest, but CF doesn’t offer that and I’m not sure of any existing APM solutions that work with Workers.
But if it was something related to the worker code it would be more consistent if we consider the exact same conditions (same db data, same JS logic), no?
I’ve been hitting that limit ever since I started with Workers. Only solution has been to optimize and use native features as much as possible, even skipping WASM in some cases.
I first started seeing the spikes with pretty small HTMLs (aprox 1kB).
My first hunch was that these spikes were related to the size of the HTML (more HTML = more CPU work) so I modified the worker to produce a much larger html (aprox 250kB) but the behavior is very similar. Only a very small percentage of requests go above 10ms.
In fact since I started using the worker the median CPU time is 1.7ms.
I don’t know, I’m starting to think Workers are not a good fit for this use case and I should probably move the HTML rendering somewhere else. It’s a shame since I would have preferred having all my infra with Workers.
My first year with Workers I had the same goal, everything in workers. But it’s still too limited, I’ve moved quite a lot of the workload to AWS Lambdas. It’s easy keeping the execution under 100ms there (since I already optimized for Workers), so costs can be closer to Workers. Just have to keep in mind that their API gateway will incur costs for every request and bandwidth have to be calculated. I expect around 100-200% higher cost.
If you consider Lambdas, use the Serverless Framework to manage it or you’ll be pulling your hair out in no-time…
You can create a random worker ID and store it in a global variable with a start time, and log that with each request. That will let you see if it’s a fresh worker or not… We do this with Logflare so you can actually see the invocation of each instance of a worker.
The difference is that this worker does practically nothing. It’s like 20 lines of code. It reads the cache API and if there is no cached response it gets the response from KV.
Also this one which is a Workers Site doing the same thing although with lower spikes.
Yeah and it should be easy enough to wrap functions around a timer and log that too. I don’t think workers support the performance API still. We just use Date.now which I think is good enough.
Our CPU limit enforcement is designed to be lenient with random spikes – in part because, practically speaking, there may be no way to avoid them. On a machine that is doing lots and lots of other things at the same time, random noise can cause the same computation to take much more CPU time sometimes.
Basically, you are only at risk of errors if your running average CPU usage is over the limit, or if a single request goes way over, like 10x.
I hesitate to explain the mechanism in more detail because we’re likely to change it soon. But, in short, you don’t need to worry about random single-request spikes.
As for why you’re seeing a spike, I’m not entirely sure. We don’t actually count script startup time in the first request time, so it’s not that. But I think another possibility is lazy parsing. V8 tries to avoid parsing function bodies until the function is first called. If you have a fairly large code footprint, a lot of which gets called on the first request, that could explain the first request running a bit slower, I think. That said, I’m speculating here; it’s hard to say without doing some profiling (which, unfortunately, at present, you wouldn’t be able to do yourself – we’d like to fix that).