We have an unfrequent, yet unpredictable issue with our static site that cause our users to see 404s for JS resources for around 1 hour following some code deployments.
Our deployment involves simply overwriting the old
index.html with the new version to Google Cloud Storage (plus copying in all other resources with version-specific filenames - e.g.
Without proxying in Cloudflare, requests go directly to GCP and users are immediately able to see new changes and a fully-functioning site following deployments.
With proxying enabled in Cloudflare, we have situations where all users see
index.html, which requests
app-v2.js, and is presented with a 404, despite the resource being available in GCS.
This indicates Cloudflare is successfully requesting HTML from the origin, but not requesting the origin for JS resources. The only settings other than default zone settings, is that we have set Browser Cache TTL to
Respect Existing Headers. Once encountered, this behaviour is consistent across all users for 1 hour, with or without browser cache enabled, and is extremely disruptive as you can imagine.
What we believe is happening:
- A user has accessed our site during the deployment, after the new
index.htmlis uploaded, but before
app-v2.jshas been uploaded.
- Cloudflare immediately serves the new
index.htmlto the user, however reports that
- Cloudflare caches the 404 response
- All subsequent requests from users to access the site return 404 on
- After one hour, the Cloudflare edge cache refreshes from the origin, and begins serving
app-v2.jsas a 200 response
I have the following questions:
- Are my above assumptions correct?
- If so, why does Cloudflare cache 404s? Should 404s be cached on Cloudflare if the origin is a cloud-hosted object store (e.g. GCS, S3)?
- Is it possible to disable caching of 404 responses?
- What setting could possibly be causing the 1 hour cache? We have not enabled Origin Cache control on our zone.
Additionally, I would like to ask for best practices around UI deployments of this manner for high-traffic sites, as I believe our architecture and deployment method this is very common. Appreciate some feedback on things like
- Cache times/headers
- Purging the cache at every deploy
- Switching to blue-green deployments (assuming that by changing the origin, the cache gets purged)
Appreciate your time, and thanks in advance for your answers