Getting MISS before Max-Age is met

We have set Cache Everything, with Origin Cache Control On across all our site’s pages.

I am setting Max-Age to two days (172800) for pages that get hit by a lot of bot traffic to keep that all away from our server, but we’re seeing these requests still hitting our logs.

Testing this afternoon, I can see pages in our HTTP logs are already showing MISS within an hour. In Chrome, I have gone back to my history, and some pages that were showing HIT an hour ago are now showing MISS, however pages I checked at a similar time are still showing HIT. I’m finding this all pretty hard to test properly, but it seems that CloudFlare is expiring or purging these pages early.

Here’s the headers for first request (MISS):

cache-control: public, max-age=172800
cf-cache-status: MISS
cf-ray: 56c3d4dfeb35d224-MAN
content-encoding: gzip
content-type: text/html; charset=utf-8
date: Fri, 28 Feb 2020 16:38:55 GMT
expect-ct: max-age=604800, report-uri=“https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct
last-modified: Tue, 04 Jun 2019 16:15:31 GMT
server: cloudflare
status: 200
vary: Accept-Encoding

Then after a refresh:

age: 642
cache-control: public, max-age=172800
cf-cache-status: HIT
cf-ray: 56c3e4920879e597-MAN
content-encoding: gzip
content-type: text/html; charset=utf-8
date: Fri, 28 Feb 2020 16:49:37 GMT
expect-ct: max-age=604800, report-uri=“https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct
last-modified: Tue, 04 Jun 2019 16:15:31 GMT
server: cloudflare
status: 200
vary: Accept-Encoding

This all looks right to me, however after an hour or so that page will show MISS again.

Any ideas?

Hi @TechColin
What type of requests?
Do they end always with a particular URI?

First thought, if these type of requests, you are 100% sure that are malicious, you can mitigate some, with the Under Attack mode to ON. Of course that will not solve your problem permanently, but will stop these (Malicious) requests to hit your server until find out what exactly is going on.

Then you can analyze those requests and apply FireWall Filters:

In the learning center also you can find out more about Bots and Filtering:
https://www.cloudflare.com/learning/bots/what-is-bot-traffic/

Hi

These aren’t really DoS type attacks as the requests aren’t frequent enough, but just some automated processes / screen scrapers and backlink checkers out there that use a regular browser User Agent.

The main aim is to have as many pages delivered from CloudFlare so we can handle all kinds of traffic spikes.

I’ve maybe created some confusion of my own here with my local testing. All pages now seem to be returning HIT, however there are definitely pages in the logs from today that aren’t cached.

I’ve read that CloudFlare won’t cache the page if we are setting cookies. That might explain some of the log entries - if that was a user’s first page then we’ll be setting various cookies. Unless they, or another user, viewed that page again is it feasible the page wasn’t cached?

@TechColin , I recommend, to read and understand how exactly cache works on CloudFlare:

If you can log in your HTTP logs the Cloudflare cf-ray header too, you can verify which cf datacenter and thus visitor is hitting your origin as Cloudflare cache is per datacenter.

So in theory with 200+ CF datacenters, your origin can be hit with a cache miss at least 200 times once from each CF datacenter until all CF datacenter caches are populated.

I setup Nginx origin logging for Cloudflare cf-ray and other details myself which would be handy for such analysis https://community.centminmod.com/threads/cloudflare-custom-nginx-logging.14790/

3 Likes

This topic was automatically closed after 30 days. New replies are no longer allowed.