I’ve made a Cloudflare Worker that returns requests based on the geographical location of the visitors. For instance, US workers go to the default server (example.com) and European visitors get pages served from the eu.example.com origin. But with the returned URL still example.com. This way I cut down on geographical latency.
It works fine with workers, which is pretty cool .
Now I’m looking to shield all those origin servers (with duplicate content) from regular visitors. Is there a way to do that?
From what I understand I cannot simply issue redirects on eu.example.com requests to example.com, because that would also redirect workers themselves who fetch assets from eu.example.com. Do workers have some kind of special signature (like a user-agent) that I can use to filter on before triggering a redirect? Or is there a better way to achieve my goal?
Only accept connections to your origin from Cloudflare IP addresses. You can find a list of Cloudflare IPs that you would need to whitelist here: IP Ranges
Configure your origin to to use TLS Client Authentication in order to only accept requests from Cloudflare. We call this Authenticated Origin Pulls, which you can find under the SSL/TLS app on your dashboard.
Use Argo Tunnel. This involves installing a daemon on your origin which creates a persistent outbound connection to Cloudflare, through which all requests are routed. This allows you to ignore all inbound connections at your firewall.
Cloudflare Tunnel is likely the most convenient solution. If you decide to go with the other methods, I’d recommend implementing both an IP whitelist and Authenticated Origin Pulls.
This is something I’ve already done. But since the multiple origin servers have the orange cloud () with the traffic being served through Cloudflare, this isn’t possible.
(I could turn off routing traffic through Cloudflare for the origin servers, but that comes with a performance penalty since content won’t be cached then.)
Subrequests made with the Fetch API are cached inside Cloudflare’s edge. This means if you load a data file in your worker from a URL that is appropriately configured to allow caching
So it’s not a problem if my origin servers are not ‘orange clouded’, since the worker will still cache their content. That’s great!
One thing left: if I purge everything through Cloudflare’s API, does that also invalidate the Worker cache?
(The doc on purging says it removes all resources, so I’m pretty sure that include Workers, but just to check.)
These options are not possible; my origin servers are S3 buckets. Sorry, should have mentioned that earlier!
Great question. Right now, the answer is no – responses to subrequests to greyclouded domains cannot yet be purged. This is a bug, and we have a fix in the works.
Ah, perhaps request signing would be useful, then? That is, make the buckets private and give read-only access to a particular AWS secret key, then implement AWS request signing in the worker with that secret key. The WebCrypto interface provides the necessary HMAC-SHA1 algorithm, or you could try bundling a library like aws4 with webpack to make things a bit easier.
Ah well good news that a fix is in the works. I’ll test with short cache periods (like 15-30 minutes) first so that the cache can naturally expire. Obviously I hope to cache for longer in the future.
Where can I keep updated on when that bug is released? Since I’ll have to change my setup then. I’m already a daily visitor to the Cloudflare blog.
Wow, this is a really insightful comment and suggestion! I’ll look into this, might be a fun project in the weekend.
I’ll follow up here when it’s fixed. We’re working on setting up a better way to manage workers-related announcements like this, so hopefully I’ll have a better answer soon.
Thanks! If you end up not using aws4, you might be interested in this signed requests recipe to get a feel for how to work with the WebCrypto API. Note that that recipe is written from the perspective of using a worker to verify signed requests, but it also has an example of how to generate signed requests. The AWS system is, of course, more involved.
Might you (or another person reading this) know what the fastest of the following options is for my worker? I couldn’t figure it out myself with Googleing.
Add S3 buckets with CNAME to Cloudflare’s DNS with automatic TTL.
Add S3 buckets with CNAME to Cloudflare’s DNS with TTL set to one day.
Don’t use DNS resolving but hard-code the S3 bucket URLs in the Worker’s code directly.
The benefit of #2 is that it caches the DNS lookup in the Cloudflare data centre, saving this lookup for subsequent requests.
The benefit of #3 is that there’s a step less in DNS resolving, albeit with an unknown TTL.