Use a CF worker and the cache API to guarantee cache HIT

I’m using the following CF features to try and mimic a complete edge experience for most of my site.

Argo Tiered Cache

Cache Reserve

“Cache everything” page rule

This seems to work 99% of the time, but every once in a a while I’ll see Googlebot or different user agent get served a MISS for a url (which should fall under the above category)

I need this to be 100% as a URL from the origin could take a couple seconds to load ( PHP, DB issues I need to resolve)

Is there a way to use a CF worker and the cache api to make sure a URL never gets served with a MISS?

Or perhaps there is another solution Im not thinking of?

Would propagating the cache using a VPN from various locations do the trick?

Any insight here would be very much appreciated

Anyone has some insight here possibly?

So after playing wit ChatGPT for a bit I was able to come up with this…

good idea?

const alternativeDataCenters = [
  'lhr.cloudflare.com', // London
  'nrt.cloudflare.com', // Tokyo
  'lax.cloudflare.com', // Los Angeles
];

addEventListener('fetch', (event) => {
  event.respondWith(handleRequest(event.request));
});

async function fetchWithTimeout(url, options, timeout) {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), timeout);
  const fetchOptions = {
    ...options,
    signal: controller.signal,
  };

  const response = await fetch(url, fetchOptions);
  clearTimeout(timeoutId);
  return response;
}

async function handleRequest(request) {
  const cache = caches.default;
  let response = await cache.match(request);

  if (!response) {
    const timeout = 3000;

    try {
      response = await fetchWithTimeout(request, {}, timeout);
    } catch (error) {
      // Fetch from alternative Cloudflare data centers if the origin times out or fails
      for (const dataCenter of alternativeDataCenters) {
        try {
          const fetchOptions = {
            cf: {
              resolveOverride: dataCenter,
            },
          };
          response = await fetchWithTimeout(request, fetchOptions, timeout);
          break;
        } catch (error) {
          console.log(`Failed to fetch from ${dataCenter}`);
        }
      }
    }

    if (response && response.status === 200) {
      const responseClone = response.clone();
      event.waitUntil(cache.put(request, responseClone));
    } else {
      // Fallback content if no valid response is received
      response = new Response('Fallback content', {
        status: 200,
        headers: {
          'Content-Type': 'text/plain',
        },
      });
    }
  }

  return response;
}

There are a bunch of factors here to consider, I’ll start with your first post.

There is several limitations and requirements assets must meet for Cache Reserve to be used:
Namely,
Be cacheable, according to Cloudflare’s standard cacheability factors,
Have a freshness time-to-live (TTL) of at least 10 hours (set by any means such as Cache-Control / CDN-Cache-Control origin response headers, Edge Cache TTL, Cache TTL By Status, or Cache Rules),
Have a Content-Length response header.
Origin Range requests are not supported at this time from Cache Reserve.
Vary for Images is currently not compatible with Cache Reserve.

Cache Reserve should be a good fallback for a lot of assets, perhaps some assets are still slipping through other requirements like too short cache TTL, missing Content-Length, or just eventually expiring and needing to be refreshed. If your cache rate is truly 99%, that’s pretty good.

Normal Cache can be quite effective on its own as well, but you need requests flowing to keep it happy. Especially if you are on free plan with low requests, your assets will be evicted sooner rather than later, using a VPN wouldn’t help, you’d fill up a few zone caches for a bit, but if you weren’t getting enough requests to keep your assets in cache to begin with, you’d have the same issue.

Using a Cloudflare Worker with Cache API is a major step back. The Cache API does not use Tiered Cache or Cache Reserve, it merely interacts with the local cache of the colo (cf data center) the Worker is running in.

That worker is entirely a result of ChatGPT’s Hallucinations (I assume also that was GPT 3.5 and not 4). That’s not how you use resolveOverride, and that whole alternative datacenter idea would never work/makes no sense.

In my opinion, you’re spending too much time trying to fix the symptoms of the issue instead of the issue itself. Even with all of the layers of cache in the world, cache is eventually going to expire. That’s the nature of cache.

If you’re using something like WordPress, there are plugins to export your site to static html/css/js that you can then throw onto Cloudflare Pages, which would not require your origin at all and all of your assets would be served from Cloudflare itself:

There’s no way to absolutely guarantee a hit. If you want a “complete edge experience”, you could switch to something like Cloudflare Pages, but that’s only going to work with static html/css/js or supported SSR Frameworks (not php).

4 Likes

I appreciate the reply @Chaika

It’s close to it yes, I’ve set response headers to a TTL of months in most cases

Im on the Pro plan and this very interesting and makes sense, Im sure the pages are not getting enough hits so this could be the reason for them being evicted.

It’s 4 actually but see your point

I think you sold me here, its not consistent and is a little over my head, had a programmer looks at it a while ago but he was never able to figure it out. Its Joomla 3 with a bunch of custom modifications, Im thinking its SQL related but will need to really look into it

Im curious, 99% of the site is static, its just the user login and backend thats really dynamic. Could this be a solution where I keep the login and admin on in its current state and use the CF pages for everything else?

In the past, you would need to have locations in at least 285 cities that CF operates datacenters in :slight_smile: But Cloudflare Tiered Caching Tiered Cache · Cloudflare Cache (CDN) docs should help with that somewhat. If you analyze your CF Tiered Caching analytics and figure out your Tiered cache datacenter locations, you can probably just hit those ones to keep a pre-warm cache.

That isn’t possible as Cloudflare CDN caching is done on a per datacenter basis and there’s over 285 cities and 100 countries where CF has presence in. There are many steps you can take to get closer to that possibility of nearly always being in CF CDN cache like

  1. Cache Everything Page Rule or CF Cache Rule equivalent
  2. Cache Reserve only if your origin asset response returns a content-length header which most gzip/brotli compressed responses from origin servers won’t include - so Cache Reserve ends up making such asset serving infact slower rather than faster
  3. Cloudflare Normalization at the edge and origin URL normalization · Cloudflare Rules docs
  4. Cloudflare Transform rules for URL rewrites where you can tell certain URLs to ignore query strings which cache bypass caching
  5. Cloudflare Tiered Cache Tiered Cache · Cloudflare Cache (CDN) docs
  6. Cloudflare Enterprise plan’s Cache Prefetch which can effectively CDN cache prefetch URLs you tell it to prewarm the CF CDN cache with Prefetch URLs · Cloudflare Speed docs

URL prefetching means that Cloudflare pre-populates the cache with content a visitor is likely to request next. This setting — when combined with additional setup — leads to a higher cache hit rate and thus a faster experience for the user.

But you can’t really avoid masking your origin’s slowness due to server side issues with CF CDN as CF CDN doesn’t cache all assets by default Default Cache Behavior · Cloudflare Cache (CDN) docs

If you read my TTFB optimization guide at Improving Time To First Byte (TTFB) With Cloudflare the 3 segments of optimizations, the last being your origin server side optimizations need to be taken care of.

Indeed, in such cases instead of using Cloudflare Workers with Cache API, you can use the fetch API for Worker caching How the Cache works · Cloudflare Workers docs

Tiered caching

The Cache API is not compatible with tiered caching. To take advantage of tiered caching, use the fetch API.

3 Likes

Thanks for this !

Youre right, the only thing that returns a content length header are png images and fonts, if I add the content length header (via a CF transform rule) to the html, svg, js, css would that work to keep them in the reserve?

Edit does the content length value have to be exact for this to work?

1 Like

I haven’t tried incorrect/inaccurate content length headers. I don’t think CF Transform rule would work, it would have to come from your origin server. I wouldn’t rely on Cache Reserve anyway, Tiered Cache would be a better option.

2 Likes

i agree, the main problem is content-length header requirement and there is still lack of docs/tutorial to set it up & works.

Why dont you try using R2 for this?

Will look into this, not really for this topic (as it sorted itself out), but in general.