Script/service to prime Cloudflare's cache for each node/colo?

caching

#1

Cloudflare support suggested I post my question here.

Like others, our site is significantly faster when requested assets are available in Cloudflare’s cache. This will always be the case, no matter how much we improve the performance of our origin server.

I can run a script when our assets/content changes, so I was hoping I could build a simple function to make the first request and prime Cloudflare’s cache.

However, I checked with Cloudflare support. Each node/colo caches independently of one another:

An asset is only cached at a specific colo when that asset is requested from that colo. There is no “cache warming” [or replicating] between colos.

So I cannot simply make one request to prime all the colos/nodes. I need some way to loop through all the colos for each asset. Some potential ideas come to mind:

  1. Use webpagetest.org. They have a distributed network from which you can control where (geographically) the client requests come from. Obviously this won’t perfectly match Cloudflare’s network map, but it’s probably better than nothing. However, the most significant problem with this approach is speed. I’ve seen tests queued for 20+ minutes in some cases. Also, I don’t need to replicate a full browser environment, so this approach is likely overkill.

  2. Perhaps there’s a way (request header?) to force cloudflare to respond from specific nodes/colos? This would be trivial for me to write a script to handle.

  3. Another 3rd party service with a distributed network that can proxy requests from different locales. I have no idea if something like this exists (at a reasonable cost). Hoping somebody here might have ideas.

FWIW, I only really care about priming our cache in North America.


#2

Well technically I guess that response from support is correct with the default Cloudflare config, you can take advantage of Cloudflare’s tiered cache. This system utilizes regional cache tiers so that a request for an an asset not currently in cache at a particular colo is checked for in the regional cache. If not available there the regional cache then retrieves ti from origin and subsequent requests from other colos in that region will retrieve it from the regional cache reducing the number of requests to origin and improving overall performance/ load on the origin server(s).

We also have some customers who utilize Cloudflare (with or without argo) in conjunction with a solution like GCP’s platform to host images on a separate URL (e.g. images.example.com) from dynamic content served from the origin and there are some discounts (from teh Googles) available for doing so. I don’t really have any specific data on that approach in terms of performance, but in the US I would imagine it’s probably pretty good.


#3

This is helpful. I thought I had Argo enabled… I did not, but do now.

My motivation for improving cache hit ratios, is not just for real users, but also for googlebot/seo. Google is adding page speed as ranking factor by the end of June.

In order to optimize for at least Googlebot, perhaps I could proxy a request in California. This would make sure that the colo/regional cache is primed before submitting content/index updates to Google.

Is that a good idea? Do you have any customers doing anything like that?

Edit: I dug through our server logs. Majority of googlebot requests still come from Mountain View.


#4

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.