High number of requests to origin for the same resource

Hi

I’ve setup caching through Cloudflare on a CNAME setup (shoudln’t matter).
From the statistics, I see that only 90% of my requests are being cached, which means 10% misses.
I just migrated from CloudFront, which had 99.9% HITS, so I set out to see if I could find out why Cloudflare would cache so much less.

I see this in my request log on my origin:
(I’ve obfusaved the IP addresses, but the uniqueness is kept)

2018-11-28 10:24:34 1.2.3.4 GET /scripts/bundles/search-script.min.js _=1543400673397 80 - 4.5.6.7 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/70.0.3538.110+Safari/537.36 http://domain-a.dk/et-til-artificial-techincal-icatalogue/ 200 0 0 2
2018-11-28 10:24:34 1.2.3.4 GET /scripts/bundles/search-script.min.js _=1543400675274 80 - 4.5.6.7 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/70.0.3538.110+Safari/537.36 https://www.domain-b.ca/WowActions/ 200 0 0 3
2018-11-28 10:24:34 1.2.3.4 GET /scripts/bundles/search-script.min.js _=1543400675221 80 - 4.5.6.7 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_14_1)+AppleWebKit/605.1.15+(KHTML,+like+Gecko)+Version/12.0.1+Safari/605.1.15 http://domain-c.dk/ 200 0 0 3
2018-11-28 10:24:35 1.2.3.4 GET /scripts/bundles/search-script.min.js _=1543400675558 80 - 4.5.6.7 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/70.0.3538.110+Safari/537.36 http://domain-d.dk/organisation/blabla/companyblabla/wow-tours/ 200 0 0 3
2018-11-28 10:24:35 1.2.3.4 GET /scripts/bundles/search-script.min.js - 80 - 4.5.6.7 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/70.0.3538.110+Safari/537.36 http://domain-e.com/cms/contact/index.shtml 200 0 0 3

Why do I get 5 requests to the same resource (/scripts/bundles/search-script.min.js) within 1-2 seconds?
The only difference I can see here, is that the user-agent differs. Shouldn’t it be cached after the first request?

I’m basically looking for a way to have that resource cached, for for all request, regardless.

My caching strategy is set to Standard, I’ve futhermore made a pagerule for domain/* that sets “Cache everhing” - but still not effect.

Any hints?

Hey there,

Cloudflare has today 155+ datacenters and we can indeed send requests to your Origin to fetch our cache, which could explain why you do receive more than one request per asset.

I’d suggest you look at Argo Tiered Caching, this could improve this by electing some Tier 1 POP so all Tier 2 would communicate with them to get the asset in their cache, this would dramatically decrease the number of requests going to your Origin: https://www.cloudflare.com/products/argo-smart-routing/

I hope this makes sense,

1 Like

Hi again

So will be fetching the asset from my origin from each of your datcenters? That sounds good to me, but that still does not add up the numbers I see.

From 8:20 to 11:40 (200 minutes), I’m seeing 87,775 requests to my origin for /scripts/bundles/search-script.min.js, coming from Cloudflare.

It has a max-age set to 5 minutes, and Cloudflare is set to respect the original expiration headers.

If one datacenters would make a request every 5 minutes to my origin, that would mean 40 requests in those 200 minutes. If all 155 datacenters do this, that would mean 155*40 = 6,200 requests over those 200 minutes. That leaves a gap of 81,575 requests unaccounted for.

Do you have any idea whats at play here?

Can you post the actual IP addresses? Could it be they are bypassing Cloudflare and go straight for your server? Are you rewritting IP address on a webserver level?

Currently, this is logs sitting behind a reverse-proxy, so all I see is the IP of the reverse-proxy (nginx, in this case).

I can try to enable some logging on the reverse-proxy temporarily to get a slice of information from it.

I’m pretty sure it doesn’t bypass every time, as this single resource is 50% of all requests, and I’ve had 1.7 million requests in total for that period of time.

You should definitely log the client addresses, otherwise it is close to impossible to say what might be going on.

Could be the caching of the assets themselves, suggest you check that cf-cache-status: HIT. Could also be the cache vary, are those assets called with a query string? By default we’re gonna vary the cache based on the query string but you can exclude them in the “Caching” tab in the caching level.

My contribution will be limited without more technical information such as a full link to this asset.

As @sandro said, I’d also verify that all those IP are ours, you can find our ranges here: https://www.cloudflare.com/ips/

Hi again

In the meantime, I saw that the resource in question, is having a unique ID appended to the querystring. I thought for a moment that this was the explanation. I then went into Cloudflare and set the Caching Level to Ignore query string and thought the case was closed. I continue to monitor, and to my surprise, I’m still seeing the same resource being fetched from my origin a large number of times. The different is that I now have the IP of the edge-location to verify if it’s the same requester between requests.

See here:

162.158.74.37 - - [28/Nov/2018:12:40:09 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408807374 HTTP/1.1" 200 119492 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1"
162.158.74.37 - - [28/Nov/2018:12:43:04 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408985718 HTTP/1.1" 200 119492 "https://domain-a/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0"

162.158.134.158 - - [28/Nov/2018:12:43:04 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408984325 HTTP/1.1" 200 119492 "http://domain-b.dk/en/xx/research-xx-and-units/conamore/conferences/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
162.158.134.158 - - [28/Nov/2018:12:43:12 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408992043 HTTP/1.1" 200 119492 "http://domain-c.dk/tilvalg/xx-tilvalg/yy/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8"

The same resource (if you don’t take the querystring into account), is being fetched multiple times by the same edge-location at Cloudflare (verified against the Cloudflare ip list)

How can this be?
How long does it take for the Cache Level effect to take effect? I waited 10 minutes after changing it, before I did this example.

Thanks in advance.

Replication time for all changes are up to ~5 seconds so replication time isn’t the reason here. Could you confirm me what’s the cache-control policy for this asset, please?

Can you post the URL to that resource?

Yes of course:

This is an example with the query string appended to it:
https://customer.cludo.com/scripts/bundles/search-script.min.js?_=1543410760802

The query string exclude doesn’t work for me, if I increment the value of _ I’m getting a cf-cache-status: MISS

The cache itself works, requests for already requested URLs return a hit, however as Stephane mentioned the query string exclusion does not take effect.

Check that once more.

Found the issue, you’ve got a page rule for the wilcard affecting a different level of caching and this page rule takes the precedence.

I’d suggest you change the cache everything for an Origin Cache Control and we’ll respect the cache-control implementation sent by your Origin with excluding the query string from the cache key

Wow, that seemed to fix it - THANKS for world class support :smiley:

Wouldn’t it also be fine to just disable the page rule?

1 Like

No worries :slight_smile:

No you can’t as CF just cache a specific list of extensions by default, you want to activate Origin Cache Control to raise your caching ratio - https://support.cloudflare.com/hc/en-us/articles/200172516-Which-file-extensions-does-Cloudflare-cache-for-static-content-

Thanks @sandro for jumping on this too, appreciated! Nice team work :slight_smile:

1 Like

That makes sense - thank you.

I’ll mark this as solved.

Hi hope its ok to ask here, you are saying that with argo enabled, when something got cached in datacenter x its got propagate to all Cloudflare datacenters?

because I don’t think it actually happening

Hey @boynet2,

That’s not exactly that, with Argo tiered caching, we do elect some Tier 1 POP authorised to talk to the Origin and all Tier 2 will talk to those Tier 1 to get Asset they don’t have in their own cache. That’s not a proper propagation, a request needs to come to a Tier 2 to effectively have this asset in the local cache.

Why are you asking? Could you share with me your configuration? Argo has 3 components so saying Argo activated isn’t enough, I need to know which component is activated on your domain:

  • Smart Routing
  • Tiered Caching
  • Tunnel

I hope it makes sense,

I just have traffic -> argo turned on, where do I enabled Tiered Caching?(I am on pro plan)

I am asking because my urls ttl are short(1 minute) and I really try to lower my miss rate, because my site requests are evenly divided in 5-6 locations I am seeing a lot of misses, I am working right now on some worker to help me with that, but what you described could help me