HEAD cache of jpg images

In out configuration we hide Amazon S3 buckets under cloudflare domain name and recently
we were experimenting with presigned URLs for our S3 files to be able to access files in private bucket (with no public access).

We identified that same signed URL for .jpg image returns 200 response code to both GET and HEAD query:

url1=‘https://some.domain.name/attachments/18/19/181975a1-dffb-67ef-96ea-4c297feefc49.jpg?AWSAccessKeyId=AAAAAAAAAAAAAAA&Signature=SIGNATURE%3D&Expires=1589359854
requests.head(url1)
<Response [200]>

requests.get(url1)
<Response [200]>

Which is not a valid behaviour for S3, accessed through signed URL, because request method itself is involved in sign generation. Signed URL for GET is not same as signed URL for HEAD.

But when we generate signed url for .mp4 file we see different responses to GET and POST
url2 = ‘https://some.domain.name/attachments/3c/f0/3cf054b1-e263-2cb1-1b82-2a3b374c62db.mp4?AWSAccessKeyId=AAAAAAAAAAAAAAA&Signature=OTHERSIGNATURE%2BK4SXVsNqeI%3D&Expires=1589359951

requests.head(url2)
<Response [403]>

requests.get(url2)
<Response [200]>

In this case we get correct behaviour, because our url is only signed for GET.
So maybe cloudlfare caches GET response and shows same response for HEAD query with same path

We tried to turn ON Cloudflare “developer mode” to confirm on this hypothesis and it turned out that yes, cloudlfare caches GET response and shows it as a HIT on HEAD request, but this only happens with some files extensions, like jpg or png and docx as well and not happens with mp4 files.

Can you please point me where is this documented? What exact file extension are cached same way? Is there a way to turn this off somehow?

https://support.cloudflare.com/hc/en-us/articles/200172516-Understanding-Cloudflare-s-CDN#h_a01982d4-d5b6-4744-bb9b-a71da62c160a

With page rules. The search will provide you with details on this.

If you don’t want to cache content for the assets Cloudflare caches by default you can use Page Rules as Sandro said. Most (all?) 3rd party caching layers convert HEAD requests to get requests for cachable content. A HEAD request isn’t particularly useful in that context. Either the content is in cache or it isn’t. If it isn’t in cache and will ultimately be cachable, just getting it is more efficient.

Do browsers even send HEAD requests anymore? Is there an actual problem with how in works in the ‘real world’ vs the difference you identified in a test?

Browsers do still send HEAD requests when loading via JS (required to test if CORS isn’t being violated).

We make HEAD requests to Amazon S3 files to identify if file exists and to know its size / metadata after client-side upload (boto3 does that under the hood).

Got it! Thanx a lot!

This topic was automatically closed after 14 days. New replies are no longer allowed.