Cloudflare isn't properly caching S3 bucket

Hey there, I have a website https://angular.expert

I am trying to cache requests for images stored on S3 using cloudflare.

Here is my current setup:

  • A publicly accessible S3 bucket.
  • CNAME record setup in cloudflare DNS to map assets.angular.expert to my S3 bucket assets.angular.expert.s3.amazonaws.com. Cloudflare proxying on this CNAME record is enabled (the orange cloud :)).
  • Images can be requested from the https://angular.expert website with or without CORS, as expected. E.g. https://assets.angular.expert/banners/lkr8pfklrob453bgvs2wis7a.gif

Images are being cached, but only sometimes. It seems that Cloudflare is caching multiple different responses from S3, and I can tell based on the x-amz-request-id header. At a given time, I’m seeing around 7 different x-amz-request-id header variations returned from cloudflare.

My question is, why doesn’t cloudflare just cache ONE response from S3; why is it caching multiple responses?

Out of desperation, I added a Page Rule setting the Cache Level to “Cache Everything”, but that doesn’t seem to make a difference.

1 Like

Just to help illustrate what I’m seeing, here are some non-CORS requests to https://assets.angular.expert/banners/ijebdyjwo9sahl519t8f0de5.webp

Each of these represents a single GET request:

  1. cf-cache-status: MISS, x-amz-request-id: A2M4G0J9C4ZBQANY
  2. cf-cache-status: MISS, x-amz-request-id: 3X858M348T8Y9X08
  3. cf-cache-status: MISS, x-amz-request-id: 3AZ8E48S5CBSC0PV
  4. cf-cache-status: HIT, x-amz-request-id: A2M4G0J9C4ZBQANY
  5. cf-cache-status: MISS, x-amz-request-id: S3FR3D38130KBAMV
  6. cf-cache-status: MISS, x-amz-request-id: R4DC6BZS351KVTQR
  7. cf-cache-status: HIT, x-amz-request-id: 3AZ8E48S5CBSC0PV
  8. cf-cache-status: MISS, x-amz-request-id: CTF2NJYBGE3AMTZY
  9. cf-cache-status: HIT, x-amz-request-id: CTF2NJYBGE3AMTZY

…etc.

I should also mention that all images from the bucket being served with cache-control set to public, max-age=604800, immutable.

The cache-control header in cloudflare is returning public, max-age=604800 for some reason, not sure why.

Each Cloudflare colo has its own cache. So if your request hits different colos, you will see different responses.
In a quick test, I hit FRA, AMS and CDG. First request was a MISS for each, subsequent requests were HIT.
That’s 3 different locations in 10 requests.

The time a response is cached depends on your plan and the activity on the cached asset. An asset with low activity may be purged very quickly, within minutes, so new requests will reach your origin.

If you want less requests to hit your origin, you could look into tiered-caching and cache-reserve.
With Tiered Cache, Cloudflare will check if an asset is available on a different colo before requesting it from your origin.
Cache Reserve is a paid service that saves your assets in Cloudflare’s R2 bucket and guarantees that only one request hits your origin per asset and period.

2 Likes

Thanks for your reply; enabling tiered caching seems to help with avoiding MISSes.

I guess I’m still a bit confused because the cf-ray response header should indicate which Cloudflare colo the cache is grabbed from right? e.g. my cf-ray response always ends with ORD, e.g. 83b3fa689b34618d-ORD, but I’m still receiving more than one MISS for a single image.

I would think that after the first MISS response from a particular colo like ORD, subsequent responses from ORD would all be HITs.

Or is it the case that ORD has multiple of its own caches that are load balanced? Just a bit odd because I’ve inspected network calls for other websites, and they always return the same exact cached response (I can tell based on x-amz-request-id)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.