Random Cloudflare POPs return empty cached content, how can I check for this?

I have a Worker which caches a page in Cloudflare if certain conditions are met, otherwise it bypasses the cache and fetches from the origin.

It works well, but sometimes a Cloudflare POP will simply return a blank cached page. Accessing the page via other POPs works as expected. A user has reported an affected URL now from the Seattle POP which has allowed me to experiment further with this bug.

I checked my origin server’s Nginx logs and found it served all requests to the URL with a successful 200 response and the expected $body_bytes_sent. This confirms the problem occurred in Cloudflare’s Seattle POP.

Has anyone else experienced something like this, or does anyone have recommendations for mitigation? It seems Cloudflare’s cache cannot be trusted and I will have to come up with some checks on my own for this condition if I want to keep using it.

I am already checking for a successful 200 HTTP response code before storing pages into the cache, but that clearly isn’t the issue. My only other idea is to actually check the response body returned from the cache is not empty before sending it back to the user, but I am concerned that might introduce a small delay and increase the Worker’s TTFB.

This problem is particularly insidious in that I have no way of knowing when a given POP somewhere around the world is returning a blank white page for a URL (unless a user reports it) and there are no error messages or error status codes logged anywhere. If it happens in a major POP I could lose a lot of traffic to my site.

Never experience that, but I only cache 200 and if there is content in the request body. Do you enforce any cache logic? Also, I only push to cache with event.waitUntil(), after the user gets the response.

Never experience that, but I only cache 200 and if there is content in the request body. Do you enforce any cache logic? Also, I only push to cache with event.waitUntil() , after the user gets the response.

Yes. My code looks something like this:

addEventListener('fetch', event => {
  event.passThroughOnException()
  event.respondWith(handleRequest(event))
})

async function handleRequest(event) {
  const request = event.request

  // Build cache key
  const cache = caches.default
  const url = new URL(request.url)
  const cacheKey = url.origin + url.pathname

  // Check the cache
  let response = await cache.match(cacheKey)

  if (!response) {
    // If no response from the cache, fetch from the origin
    response = await fetch(request)

    // Do not cache an unsuccessful (non-200) response
    if (response.status != 200) {
      return response
    }

    // My cache logic
    // ....

    if (cache_this) {
      response = new Response(response.body, response)
      response.headers.append("Cache-Control", "s-maxage=604800")
      event.waitUntil(cache.put(cacheKey, response.clone()))
    }
  
  return response
}

@adaptive, can you share more details about how you are checking for “content in the request body”?

I’d be curious to see your technique for this as I figure out the best way to do it myself.

This is still an issue I’m experiencing and I am running out of ideas on how to resolve it on my own.

Troubleshooting is nearly impossible because I have to rely on an affected user to report the issue, confirm which CF POP they are connecting to, and then use a VPN in the hopes that I might be able to connect to that same POP which is showing the blank page.

@adaptive, sorry to tag you again, but can you share more details about how you’re checking the content of the request or response body? It’s not clear to me how to read it.

I noticed this with Cloudflare APO, really occurs when you have lots of posts, talking 10k, the few reports cme in, so random and odd.

Turning off Argo routing helped, but still occurs. i gave up on this when the cf devs wanted an har file, and i sent it to them and got a reply saying great, please send us a har file, gave up at that point.

So yeah, if apo workers show this, its a bigger issue than just your worker.

1 Like

Thanks for chiming in, Josh. I’m actually not using APO or Argo. What plan are you on?

Sadly I’m not surprised by Cloudflare’s lack of acknowledgement on the issue. It appears the only solution is to write defensive code in the Worker which checks for this case, but even that is a challenge because it’s so wildly unpredictable. I can’t test or debug my own code until someone else reports a problem and follows up with all the necessary information.

Very disappointing that Workers and the CF cache is so unreliable.

I am on pro plan, but Cloudflare apo uses workers similar to what you have, i wqs basically saying i seen this with official Cloudflare products, seen reddit comments of users experiencing the same, so i think tbe issue is bigger than just your worker, again i gave up at the silly support asking for a har file, then its attached and they totally ignored and asked when i am sending it. :laughing:

Cloudldare keep releasing tons of new products without fixing it or improving, seems every few months something new and cool comes out, i mean mirage is still in beta for how many years for example, apo took 16 months to fix mobile cache issue. I guess its a tease to upgrade to enterprise, i would go to business right now but nothing will improve, i will still get upgrade to enterprise meh.

They should just stick to ddos protection which they are fantanstic in, and stop making half baked products that are released then development paused cause the devs are pushed to make a new product for next months week of Cloudflare hooha

1 Like

@JoshJ @deltahf you still having this issue with Workers? Maybe you’d get a better response for Workers on CF Dev Discord server?