File is not cached when accessed by IOT device

Hi, we have an embedded device that downloads a file from a specific server location. That server is proxied by Cloudflare and we have found that the file download is very sensitive to network latency issues and it appears that the download fails when the file is not cached.

It does not appear Cloudflare is caching the file when it is requested from the IOT device. I have attempted 4 downloads (all confirmed from the same proxy IP) and they all failed. Then when I open a browser and download the file, the http response header does not indicate a cache “HIT” it says “REVALIDATED.” After the file is download from the browser the download to the IOT devices appears to work reliably.

We know it is a cacheable file (extension .bin) and after downloading one time with a browser it always returns a cache status “HIT”

So my question is why wouldn’t the file be cached when requested from the IOT device?

One difference is that doing an http GET is very different than from a browser. The browser includes many http headers and details, the IOT device only includes the file name and path. Does Cloudflare require these http headers to cache the file?

Hi,
without a specific URl it’s hard to look into, but can you at least elaborate how you download it with your IoT device?
cURL? Wget? How exactly are you downloading it?
If we know this, we can check if we can reproduce the issue.

Also: what is the size of the .bin file and what cache headers do you send with it? If you don’t know, please use a PageRule to enforce caching on “*.bin” Urls.

Hi, I appreciate the quick response. The IoT device uses an embedded TCP/IP stack that sends the HTTP GET request. I found that the I can reproduce the same GET request by using a cURL command such as: curl -H “User-Agent:” -H “Accept:” https//oururl.com/data.bin
This is removes all the http headers except for the host.

When testing with the cURL command I found that cache status does change from MISS to HIT on the third download. So that disproves my theory about incomplete/missing headers.

Perhaps there is something else at play. Is there a specific trigger for the cache? Does the file need to be requested within a specific interval? Currently our download retry logic is set at once every hour.

FYI, the bin file is about 3Mb and there is a page rule setup to cache everything.

No, as long a a request comes in and hits the cache. But please be aware that there are multiple “caches” per POP. Maybe you need some requests untill it is cached.

Yes and no. If you wait too long, Cloudflare will have the cache invalidated, as it was not used.

Then let me please intoduce to “Cloudflare Pages”. Push this file to GitHub, connect GitHub to Cloudflare Pages.

Everytime you update the file Cloudflare Pages will rebuild and republish the file. Please use this approach if you want to “host” static files. Cloudflares Reverseproxy CDN is not the best way to go for such thing, unless you have a lot requests.

Hope that helps.

Hi thanks again for your help.

You mentioned that the cache will be invalidated if the interval in between requests is too long. Is there any specific duration that you can provide, e.g. is it more or less than 1hr? And is this interval anything we have control over with a page rule?

Thank you for the introduction to Cloudflare pages? So with this solution the file would essentially be hosted in each regional PoP and then we wouldn’t have to go and “prime” the cache for each region so to speak?

Yes ofc you can influence this. But not control. The setting is called “Edge TTL”, but please keep in mind that every TTL is always just a recommendation and therefore the duration is not guaranteed.

Yes “basically” that is true.

The edge TTL setting is essentially how long the file remains cached, correct? I was specifically asking about how a file gets cached in the first place. I noticed in my testing that it takes 2 requests of the file from the same PoP to get the cache status to change to HIT. Does it matter what is the interval between these 2 requests? For example if the requests are more than 1hr apart will it still cache the file?

Can’t see this in your last reply, but let me answer:
When the first request gets proxied through Cloudflare and does not hit Cache, it will fetch a fresh copy from the origin and serve it. But it also stores it in the POP you requested it from. So with the normal preverse proxy CDN every first request on every URL on every POP is basically a MISS, but when getting a MISS this request makes Cloudflare cache the asset on this URL on this POP.

Yes, in my expirience it does. If you reqtest something twice very quickly it sometimes are two MISS, as Cloudflare in the time was not able to optimize and cache the asset.

That depends on multiple things:

  1. your edge TTL
  2. your Plan (higher Plan have higher chances that their cache reaches the TTL)

Hi we are looking at Cloudflare pages, but just want to clarify the performance of it. Would this just work like today where the file is cached after the first request, but then essentially the time to live doesn’t expire? Or does it always serve the file from the edge (PoP location) even on the very first request?

Sorry, was away for a while.

It works on R2. It is definitely a permanent storage and will be fast (the R2 itself) from the very first request. Yes it still is getting proxied through a POP, and if you setup a CNAME to a Pages domain, it will report back as beeing “dynamic” for most of the files as it accesses the R2 in the background and it a HIT there.