Getting a high 'bypass' rate for a cached static HTML page

Good morning,

We’ve recently started caching static HTML pages within our site. We’ve done this by ensuring there’s a cache-control response header set, with a value of public,max-age=14400. We then added a Page Rule to set the Cache Level to Cache Everything.

When testing this using normal browsers, this is working fine - we can see the Cloudflare cache getting populated, and can see us getting hits when it’s in the cache.

However, checking our server-side logs, we’re still seeing a large number of requests that are not being served by the cache and are making it through to the server (e.g., 10,000+ requests in 24 hours). Checking these requests in the logs don’t show anything special - they don’t have a querystring, nothing else appears different, although I don’t have access to the request headers to see if anything is different there.

Checking the Cloudflare logs, I can see that for this page, we’re getting a 93% ‘bypass’ rate, and only 5% hit. This doesn’t match the data we’ve got for static resources (e.g., CSS or JS files) which are close to 100% hit. Digging into the Cloudflare data a bit more, we’re getting a large number of requests from bots (specifically BingBot, with over 60% of traffic across the whole site).

My question is this: is it possible for a bot to be bypassing the cache for static HTML pages, and if so, what can be done to avoid this?

Thanks.

Someone can bypass HTML cache by just inserting cache-control: no-cache header in their request, or force refresh the page in their browser (e.g. Ctrl+Shift+R).

So probably those bots inserted cache-control: no-cache header in their requests.

1 Like

Hi,

Thanks for your response. That will bypass the browser cache, yes, but it will still be served as a cached page from Cloudflare. I’ve just tested both of these and both times I’m still getting a hit from Cloudflare.

Hmm, that’s interesting. Perhaps it’s because I’m using APO to cache WordPress sites so I’ll get the intended behavior when there’s cache-control: no-cache header inserted in the request.

Check the rayid rayname to see which Cloudflare data centers are serving the request bypass as Cloudflare CDN Cache is per data center by default. So with 215 data centers, mathematically 215 first time non cached requests can occur for the edge cache TTL you set. So if you cache for 1hr, 215 non cached requests could occur every hour.

1 Like

But Cloudflare will return MISS instead of BYPASS isn’t it?

yeah if you have cache everything + edge cache ttl set, the a MISS is returned

1 Like

Check what is allowed to bypass (EX: IP addresses, Methods, URLS, AS (autonomous system), etc.) If needed change the rule or rules, can you show me a screenshot of your firewall rules so I could tell you what might need to be changed?

Thanks - I’ve gone through our firewall list, and we’ve only got a couple of rules that have an action of Bypass, and whilst the firewall dashboard is showing some activity with these rules, it’s still only about 15% of the total bypass responses we’re seeing in the caching dashboard.

Within the firewall dashboard activity log, I can see some bypassed activities, but nothing for the static page that we’re seeing a high bypass rate.

Are there any other rules that might be triggering this bypass, other than the firewall rules that we’ve set up ourselves?

We’ve finally got to the bottom of this, and it wasn’t an issue within Cloudflare at all.

In case anyone else has a similar problem, it was because Azure was adding a cookie for ARR Instance Affinity. The first request to a cached page (from any device or IP) would always result in a BYPASS because Azure would be responding with these cookies. However in the second request to a cached page (where the ARR Instance Affinity cookie is sent with the request), Azure would pick up the cookie and not need to set it again, and so the response could be cached correctly.

We were getting such high BYPASS rates because the requests to some data centres would only ever come from bots, and I’d guess never make a request that includes an ARR Instance Affinity cookie.

It was a simple fix in our application to disable ARR Instance Affinity for cached pages, and that prevents the cookie from being returned, and so the page can be cached on the first request.

Thanks for all the suggestions here - it really helped us figure out what was going on.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.