So, I think this thread is conflating technical architecture with pricing…
A Worker can use the Cache API to implement arbitrary caching logic. A Worker that makes good use of the Cache API should be able to achieve the same performance as if the cache ran in front of the Worker. Hence, there is not a technical reason for Workers to run behind cache.
But it sounds like the motivation here is to avoid being billed for requests that hit cache. This assumes that if cache ran in front of Workers, then we wouldn’t bill for a workers request when the request was served by cache.
However, that isn’t necessarily true. It’s entirely possible that in this scenario, Workers pricing would still be based the number of requests received, including those that hit cache and therefore avoided workers.
But why would that be? If clever caching saved the expense of running a Worker, shouldn’t we pass that savings on to the customer? Well that’s the thing: Workers are really, really fast. The cost of executing a Worker (if it’s already in memory) is actually much cheaper than doing a cache lookup. The expensive part of Workers is distributing the code to the edge and keeping a huge number of different Workers in memory at the same time. Clever caching that eliminates 90% of requests doesn’t necessarily reduce that cost, because the Worker still needs to be loaded to handle the other 10% of requests.
So, as it turns out, between a Worker that makes good use of the Cache API, vs. putting cache in front of the Worker, the cost to Cloudflare is not that different. And hence, I don’t know that we’d necessarily want to charge less for the latter. (Disclaimer: This is entirely hypothetical. We haven’t actually talked about this internally.)
Now you might ask, if our costs aren’t necessarily tied to request volume, why do we bill on requests in the first place? Well, if we tried to break down our real costs and charge directly for them, our pricing would be incredibly complicated and hard for you to predict. You’d have to think about things like how many colos your Worker is likely to run in (which requires knowing how your users are distributed), whether your traffic patterns are spread out vs. bursty, etc. You probably would have a hard time calculating these things, but you probably do know roughly how many requests you get. So charging on requests makes it easy for you to know how much Workers will cost. And if the pricing doesn’t actually match our own costs to deliver the service, that’s our problem to deal with, not yours.
With that said, a big problem with this pricing model is that it means we’ve had to put strict limits on CPU time. As you know, with Workers Unbound, we are introducing a new pricing model that has a much lower base price per request, but also charges for duration, thus allowing us to remove those limits. But this is also good for fast workers: if your Worker makes good use of the Cache API such that most requests return quickly from cache, then I would expect you will in fact end up paying less under Workers Unbound.
TL;DR: Putting cache in front of Workers would neither improve performance nor reduce cost compared to a worker that makes effective use of the cache API. OTOH, Workers that make good use of cache are likely to get cheaper under Workers Unbound.