R2 - Provide fallback to Worker for Public Buckets

Type

New feature

Description

Provide fallback to Worker for R2 Public Bucket requests

Benefit

Public Buckets and those which are accessed via pre-signed URLs, allow implementing nice simple architectures for serving static, or infrequently updated data where Objects can be created/updated programmatically in response to changes in the source data; however things become significantly more complicated in cases where the requested data may not yet be available as an Object in the Bucket, with the application consuming this data then having to handle the 404 responses and make a separate additional request to a different endpoint (probably a Worker) in order to trigger some action, such as generating the missing Object.

For example we might be migrating from a legacy system, but only want to migrate the specific data as required upon first request, so the ability to trigger a Worker script which can pull this data, return it, and create the R2 Object for future requests would be incredibly useful.

The commonly proposed approach of routing all requests through a Worker which pulls the data from the R2 Bucket, greatly increases operational costs, as we not only have the cost of the R2 storage and GetObject operation, but also a Worker invocation and CPU time, and for systems with a high volume of requests this could be cost prohibitive.

A fallback feature like this is available on the Object Storage offerings from other Public Cloud providers.

The Static Assets feature of Workers kind of has something similar to this capability, in that requests where there is no matching Static Asset are routed to the Worker, but it appears to be implemented in a very different manner, using an Asset Manifest to identify the assets to be served, rather a simple fallback approach, and once deployed these static assets can’t be updated programmatically in a Worker through the ASSETS binding, and that’s not available to other Workers anyway. Whilst it would be possible to use Direct Uploads to add/update assets, this is rather long-winded, and the dynamically created Objects uploaded in this manner wouldn’t survive a redeployment of the Worker, unless we also added them to source control, or another way to persist them were introduced. This is messy, completely impractical for large numbers of Objects, and I suspect would perform very poorly in such cases.

There’s no way to implement this using the standard Page Rules, and whilst I appreciate that Custom Error Rules would allow this, and I’m certainly no averse to using them, they’re only available on the general (not Workers) paid plans, and this really does feel like a very basic feature which belongs within R2, at a Bucket level, rather than something which must to be configured at the domain/site level.

From a Cloudflare user perspective, I see this feature being implemented as a simple URL, defined on the Bucket, to which the request would be forwarded instead of simply returning a 404; or failing that set in the location header of 302 response.

Like this?

Or if you need something different or for another provider and you are on a paid plan, put a Snippet in front of your Worker. Use the Snippet to make the request to the bucket and if found just return it (and cache it). If not found, then invoke the Worker to do the data migration work and then return the file. Then you are only calling Workers on an R2-miss and objects from R2 on a cache miss.

Yeah, these are all possible approaches, but I’d really like to see this implemented in the same way all of the other Public Cloud providers do, which is nice, simple, and meets a wide range of needs.

It’s a fairly common requirement, and something which I’ve personally had a need for, and used many times on other Cloud platforms, so having to adopt work-arounds like these, most of which are only available on account level paid plans, seems a bit much.

To be clear, I’ve absolutely no problem paying for things, but when I have no need for 99% of what a general account paid plan provides, that’s really not great, especially when it’s such a simple requirement, and the new capabilities I’d be getting still only really allow me to apply a work-around.

In terms of data migration, whilst Sippy is great for the gradual migration of all data from other Object Storage solutions (which granted, is a very common case, and what it’s intended for) with legacy systems we often want to ensure that only the relevant operational data is brought across, with the legacy system being kept around for the recall of historical data, as these systems commonly host a crazy amount of data which is no longer relevant. The source data almost certainly isn’t in Object Storage, and it’s highly likely we’ll be manipulating it, and applying what could be quite complex logic, rather than just replicating it verbatim in R2, although clearly we would have plenty of options to process this further once it’s in R2.