Private Azure blob containers and cloudflare caching

Can Private (no anonymous access) Azure blob containers work with cloudflare or do you have to set your azure blob containers to public access for Cloudflare to cache files?

Today we generate url of the blob asset on button click, url is appended with expiry time and token as there is no public access to the files.

What are some options to work with private azure blob containers?

What would private access require? Cloudflare basically just tunnels your content. If you require some additional headers, you could insert them via workers.

We generate url on the fly with query string parameters that contain expiry time of the url and a valid token.

Example:

Instead of https://documents.domain.com/photos/032719_130416_42.jpg

https://documents.domain.com/photos/032719_130416_42.jpg?sp=r&st=2019-11-25T17:49:54Z&se=2019-11-26T01:49:54Z&spr=https&sv=2019-02-02&sr=b&sig=w%2F7jkbCFQi7N2L1i8n%2FzBUWi1gjawqiQxuyk73o%2Bsw4%3D

The additional query string params are generated against our Azure storage using the access keys etc.

These are just query string parameters.

As long as you can access the address publicly, it should work on Cloudflare.

The query string params are not available on the URL until the url is requested. So when the download button is clicked, the full url is generated at that time. Otherwise the url of the asset is just like /photos/abc.jpg which is not accessible because container requires a valid shared access signature.

How can the files be cached while stored in the private azure blog storage when there is no shared access signature appended on the url? are you saying that we need to generate these unique query string params inside the workers? We can program against the Azure blob storage from within workers to generate shared access signature?

or are you saying that when that new URL is generated with query string params and accessed for the first time, it will then be cached? instead of cloudflare pre caching.

Thanks

You can cache on Cloudflare either based on the query string or ignore it. However the request has to go through Cloudflare, should you do something with the query string outside of the browser, that will not reach Cloudflare and hence wont be cached either.

Can you provide some example link?

Example: This link will expire soon.

https://bkdemostorage.blob.core.windows.net/photos/032719_130416_42.jpg?sp=r&st=2019-11-25T19:08:06Z&se=2019-11-26T03:08:06Z&spr=https&sv=2019-02-02&sr=b&sig=p9FF169norWeU7tYKkeP5v%2FeMxkQm03G6vzhFWaWfaU%3D

Actual url to the asset is this https://bkdemostorage.blob.core.windows.net/photos/032719_130416_42.jpg but the container in which file is stored is not public. We generate signature against the storage and append to the url with expiry time to access the file.

So how can abc.jpg sitting in Azure blob be cached when it cannot be accessed without SAS? SAS is appended in our application when download button is clicked.

So basically the values in the query string are the authorisation data?

You could simply set that up as a CNAME pointing to bkdemostorage.blob.core.windows.net and then access it via your domain just like that.

You’d still need the query string data though.

Is the asset going to be cached after it’s accessed via the authorization data appended to the url by the end-use? because otherwise cloudflare can’t access the blob and start pre-caching.

Only if your caching level is set to ignore the query string, otherwise it will still require the identical string to be taken from the cache.

query string values are dynamic and change every time someone access the url. The photo abc.jpg is same but url to access is different every time it is requested.

The question is, do you want it cached at all and if so, based on what and for whom?

We have tons of PDF documents in private blob storage that we want to cache and distribute across the globe. These documents are accessed by dynamically generating authorization token and expiry time which is different every time someone clicks on the document to access. I just want to understand the solution to pre-cache these documents from cloudflare perspective when documents are in private azure storage.

First thing is how do do we cache these when in private blob container?

Cloudflare does not care what mode that is set to. It will cache whatever it receives.

The question still is what you want to cache and based on what. If you cache everything regardless of the query string your authorisation string becomes pretty useless.

We want to cache all these documents based on the documents url.
so since our azure blob container is private, cloudflare won’t be able to access this document by url.

so to your point if we want to cache all documents, we need to make our azure blob container public and get rid of authorization params and let cloudflare cache everything.

Not necessarily, but if you cache it on Cloudflare (and do not take into account the authorisation headers) it will be publicly cached and your authorisation will be rendered somewhat pointless.

As I said before, it really comes down to what you want cache and for whom. Can you provide several examples?

Goal is to distribute these documents across the globe for users to download these documents from the nearest data center.

Today we have uploaded these documents to another cdn provider and they have distributed our documents across the globe. We generate urls of the documents against that cdn provider with authorization token and the getting local copies.

We would like to cut over to cloudflare and trying to see if this similar use case is possible and also don’t want to give public access to these documents meaning we want to put expiration on the URL as our current cdn provider supports tokenized urls.

And you do not need to have the authorisation take place?

We prefer to continue having authorization take place.

Well, thats the issue I have been talking all along. If you cache the files the authorisation wont run any longer. You could still cache with the query string, in which case the files will be cached by-authorisation but as that expires that probably wont work either.

You could possibly do something with workers, but that really depends on your requirements and you are yet to name them.