R2 with custom domain setup - Object not found 404 returns cache header with max-age of 24 hours rather than a short expiry

I have an R2 bucket with a custom domain setup.

When I request an object using the domain name for an object that doesn’t exist, it returns a 404 as expected. However it is also returning a cache-control: max-age=86400 header (cache for 1 day).

I read that Cloudflare’s cache, by default, should cache 404 responses for 5 minutes.

I don’t have any custom cache rules set up or response header modifications.

Response headers:
image

Expected behaviour: I am expecting 404 responses from R2 to return either a cache-control: no-cache or a shorter eg max-age 5 minutes header.

I imagine that this would need to come from the origin, R2.

Thanks again

1 Like

Interestingly I just set up a new bucket against another domain just to check to see whether it behaved the same.

For this second bucket, if I try to access an object path that doesn’t exist but it has an extension that would usually be cached (eg .jpg, .png etc) it returns a header cache-control: max-age=14400 which is 4 hours, so that’s shorter. I’m not sure at this point what is influencing the cache-control header length difference between the two buckets/domains.

So it looks like there is a difference in behaviour from one of my domains to the other, but it still isn’t a particularly short cache for a 404 response.

1 Like

You are mixing two caches here. The five minutes apply to the proxy cache and that’s how long it will be cached on the proxies. What you are referring to is the browser cache and that’s unrelated.

Yes, you would need to configure this on the origin - or you could override it with a page or cache rule.

Thank you for your response.

According to this page https://developers.cloudflare.com/cache/about/cache-control/ it says that the cache-control header is used to tell Cloudflare how to handle caching.

Set Cache-Control headers to tell Cloudflare how to handle content from the origin.

I understand that the browser will use the cache-control header to decide whether and how long to cache an item, but I am under the impression that the proxy cache also adheres to the value that comes back from the origin to decide how long to cache a response for, is that incorrect? Does the proxy cache only cache 404s for 5 minutes no matter what?

So with that in mind, I think that there is a problem here because I don’t think that there is a way to control the cache-control header for an object that doesn’t exist in the R2 bucket, without applying a blanket rule that overrides the cache-control header for all requests to my connected bucket’s domain - which for a highly static and cachable object bucket would be undesirable to say the least.

I wouldn’t want to set a cache-control response header of 5 minutes just to satisfy 404 requests…

Sorry if I’m misunderstanding, thanks for your help.

1 Like

Just in case it’s interesting/useful to any other travellers around these parts - I have at least been able to confirm that if I put an object into R2 with a CacheControl property that this is returned in the header for objects that exist at least.

await CloudflareR2.putObject({
    Key: sourceKey,
    Body: sourceObject.Body,
    Bucket: cloudflareImageBucketName,
    ContentType: sourceObject.ContentType,
    CacheControl: 'max-age=31536000'
}).promise();

But as far as I can see it’s not possible to control this header for objects that don’t exist. Unless it’s possible to put metadata onto the “directory” level in R2…?

1 Like

So essentially your question is how to control the caching for non-existing items in R2, right? @sdayman may have an answer to that, he is our residential caching guy :smile:

Thank you, yes that’s the question. :relaxed::relaxed::relaxed:

Just one more comment from my non-R2 side :slight_smile: if nothing else works, you could use a Worker, where you specifically set the caching time for 404 requests, but of course that would be paid beyond 100,000 requests, so a native way would be more elegant.

Yep. I already tried the progressive transfer from s3 to r2 last month and I rinsed through over 8 million worker requests after deploying it on and off for brief periods over the course of 4 days. That’s completely unsustainable for me to serve static assets. Cloudflare really shines when it comes to caching static assets but it seems like this may be set up incorrectly to return a header for 404s - I’m aware that r2 is a beta product so it comes with the territory I suppose.

Even for items that do exist CacheControl doesn’t seem to be working as expected. It works for new objects but not existing ones.

For me, once an item is cached, if I purge the cache, the cache-control header doesn’t appear to be being reevaluated on re-retrieval.

So I’ve set CacheControl metadata for an image within my images bucket in R2.

Using this command:

aws s3 cp s3://images/content/images/quiz/694956_1666818541_70486185-8199-42ae-a492-4e6892172ff2.jpg s3://images/content/images/quiz/694956_1666818541_70486185-8199-42ae-a492-4e6892172ff2.jpg --cache-control max-age=12345 --endpoint-url https://MYR2IDHERE.r2.cloudflarestorage.com --metadata-directive REPLACE

Then I can confirm that the item is updated correctly by using head-object on the object:

aws s3api head-object --bucket images --key content/images/quiz/694956_1666818541_70486185-8199-42ae-a492-4e6892172ff2.jpg --endpoint-url https://MYR2IDHERE.r2.cloudflarestorage.com
{
    "AcceptRanges": "bytes",
    "LastModified": "Sun, 06 Nov 2022 12:14:19 GMT",
    "ContentLength": 107477,
    "ETag": "\"cb2130ca2c44c28086f619bc336371c2\"",
    "CacheControl": "max-age=12345",
    "ContentType": "image/jpeg",
    "Metadata": {}
}

It misses the cache, so should be re-retrieved from R2, but the cache-control header stays at it originally was (should be max-age=12345):

1 Like

I have put a ticket in for this issue as I believe that it is probably unwanted behaviour from the point of view of R2 + domain setup.

1 Like

According to Cloudflare support, it looks like items that are not found in R2 return the cache-control TTL that is configurable under Caching > Configuration from within my account.

As mentioned by @sandro I was mixing up two types of caching here. Just because Cloudflare returns a HIT on a 404 for my not-found resources and a cache-control header with a TTL that is longer than few minutes like I expected, it looks like the Cloudflare edge cache will only keep hold of that 404 for a limited time before it is dropped out. You’d need the enterprise to set a different cache-control header that deviates based on the HTTP status code.

The cache-control header comes from the CacheControl setting for the object within R2, which I’m assuming that Cloudflare will honor to some extent when caching resources. This can be set for an existing folder and existing objects within R2 using the following AWS SDK command:

aws s3 cp s3://yourbucket/yourfolder/ s3://yourbucket/yourfolder/ --cache-control "public, max-age=31535000" --endpoint-url https://[YOUR ACCOUNT ID HERE].r2.cloudflarestorage.com --metadata-directive REPLACE --recursive

That’ll do a server side copy of your resources in R2 and update the CacheControl property. Or you can set it when you put the object into the bucket in the first place - see nodejs code in my previous post.

Thanks everyone, including Cloudflare support for help.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.