High number of requests to origin for the same resource

Can you post the actual IP addresses? Could it be they are bypassing Cloudflare and go straight for your server? Are you rewritting IP address on a webserver level?

Currently, this is logs sitting behind a reverse-proxy, so all I see is the IP of the reverse-proxy (nginx, in this case).

I can try to enable some logging on the reverse-proxy temporarily to get a slice of information from it.

I’m pretty sure it doesn’t bypass every time, as this single resource is 50% of all requests, and I’ve had 1.7 million requests in total for that period of time.

You should definitely log the client addresses, otherwise it is close to impossible to say what might be going on.

Could be the caching of the assets themselves, suggest you check that cf-cache-status: HIT. Could also be the cache vary, are those assets called with a query string? By default we’re gonna vary the cache based on the query string but you can exclude them in the “Caching” tab in the caching level.

My contribution will be limited without more technical information such as a full link to this asset.

As @sandro said, I’d also verify that all those IP are ours, you can find our ranges here: https://www.cloudflare.com/ips/

Hi again

In the meantime, I saw that the resource in question, is having a unique ID appended to the querystring. I thought for a moment that this was the explanation. I then went into Cloudflare and set the Caching Level to Ignore query string and thought the case was closed. I continue to monitor, and to my surprise, I’m still seeing the same resource being fetched from my origin a large number of times. The different is that I now have the IP of the edge-location to verify if it’s the same requester between requests.

See here:

162.158.74.37 - - [28/Nov/2018:12:40:09 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408807374 HTTP/1.1" 200 119492 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1"
162.158.74.37 - - [28/Nov/2018:12:43:04 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408985718 HTTP/1.1" 200 119492 "https://domain-a/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0"

162.158.134.158 - - [28/Nov/2018:12:43:04 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408984325 HTTP/1.1" 200 119492 "http://domain-b.dk/en/xx/research-xx-and-units/conamore/conferences/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
162.158.134.158 - - [28/Nov/2018:12:43:12 +0000] "GET /scripts/bundles/search-script.min.js?_=1543408992043 HTTP/1.1" 200 119492 "http://domain-c.dk/tilvalg/xx-tilvalg/yy/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8"

The same resource (if you don’t take the querystring into account), is being fetched multiple times by the same edge-location at Cloudflare (verified against the Cloudflare ip list)

How can this be?
How long does it take for the Cache Level effect to take effect? I waited 10 minutes after changing it, before I did this example.

Thanks in advance.

Replication time for all changes are up to ~5 seconds so replication time isn’t the reason here. Could you confirm me what’s the cache-control policy for this asset, please?

Can you post the URL to that resource?

Yes of course:

This is an example with the query string appended to it:
https://customer.cludo.com/scripts/bundles/search-script.min.js?_=1543410760802

The query string exclude doesn’t work for me, if I increment the value of _ I’m getting a cf-cache-status: MISS

The cache itself works, requests for already requested URLs return a hit, however as Stephane mentioned the query string exclusion does not take effect.

Check that once more.

Found the issue, you’ve got a page rule for the wilcard affecting a different level of caching and this page rule takes the precedence.

I’d suggest you change the cache everything for an Origin Cache Control and we’ll respect the cache-control implementation sent by your Origin with excluding the query string from the cache key

Wow, that seemed to fix it - THANKS for world class support :smiley:

Wouldn’t it also be fine to just disable the page rule?

1 Like

No worries :slight_smile:

No you can’t as CF just cache a specific list of extensions by default, you want to activate Origin Cache Control to raise your caching ratio - https://support.cloudflare.com/hc/en-us/articles/200172516-Which-file-extensions-does-Cloudflare-cache-for-static-content-

Thanks @sandro for jumping on this too, appreciated! Nice team work :slight_smile:

1 Like

That makes sense - thank you.

I’ll mark this as solved.

Hi hope its ok to ask here, you are saying that with argo enabled, when something got cached in datacenter x its got propagate to all Cloudflare datacenters?

because I don’t think it actually happening

Hey @boynet2,

That’s not exactly that, with Argo tiered caching, we do elect some Tier 1 POP authorised to talk to the Origin and all Tier 2 will talk to those Tier 1 to get Asset they don’t have in their own cache. That’s not a proper propagation, a request needs to come to a Tier 2 to effectively have this asset in the local cache.

Why are you asking? Could you share with me your configuration? Argo has 3 components so saying Argo activated isn’t enough, I need to know which component is activated on your domain:

  • Smart Routing
  • Tiered Caching
  • Tunnel

I hope it makes sense,

I just have traffic -> argo turned on, where do I enabled Tiered Caching?(I am on pro plan)

I am asking because my urls ttl are short(1 minute) and I really try to lower my miss rate, because my site requests are evenly divided in 5-6 locations I am seeing a lot of misses, I am working right now on some worker to help me with that, but what you described could help me

I’ve your solution here :slight_smile:

You could leverage the stale-while-revalidate origin cache-control directive (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control) and activating origin cache control on Cloudflare. As a result, we would serve stale the Asset (HIT) while we revalidate to your Origin, isn’t it the dream scenario?

For Argo, having tunnel isn’t enough, you need to have Tiered Caching enabled too, Traffic tab:

1 Like

Thanks a lot for your help,

about stale-while-revalidate that’s really dream come true :slight_smile: but its the first thing I tried and its doesnt work as I thought so, there is a long topic about it that got closed Does stale-while-revalidate work?

the worker I am working on to make the stale-while-revalidate work like dfabulich said or how fastly is doing it

about argo I guess your are on enterprise plan? mine look like:

Hum, I’m not expert of the self-serve plan, if you give me your zone name I could for you whether or not Tiered Caching is included in your Argo Tunnel subscription.

For the serve-while-revalidate, indeed our implementation of this is limited:

Cache an asset and serve the asset while it is being revalidated
Cache-Control: max-age=600, stale-while-revalidate=30

Indicates that it is fresh for 600 seconds, and it may continue to be served stale for up to an additional 30 seconds to parallel requests for the same resource while the initial synchronous revalidation is attempted.

The may means that during the revalidation (after the TTL) if 2 requests goes to us at the same time, the first one will be used to refresh the cache and the second will get a direct answer from STALE, so this doesn’t help in your case, I was mistaken… :frowning:

The cache Team is still working on the full implementation of the well used cache-control directives, bear with us!