Global, decentralized leader selection with Workers KV, was: Fetch() with a client certificate?

#1

One of the services I’m trying to connect to from my worker requires additional authentication. Either a JWT, or a client certificate.

It’s pretty straight forward to run a server behind an Argo tunnel to provide the JWT, but I’m, so far, able to accomplish the design goals without having to build / host / maintain our own server. I’d prefer to accomplish this connection entirely from within the Worker.

With that being said, I can still use JWT, however, I’m having trouble finding a race-free way of caching the token (KV store), detecting pending expiration, and re-issuing the token.

So, until I nail-down a race-free, decentralized way of managing the JWT, I was hoping I could have my worker use the client certificate (mTLS) authentication for the outgoing connection.

Has anyone successfully done this? I imagine I just need to assign the key/certificate during the fetch() call, but my google-fu seems to be lacking…

Thanks,

Jason.

#2

A global variable won’t work for this?

#3

The Fetch API doesn’t expose a way to mutate the underlying TLS connection.

#4

What do you mean race-free, exactly?
What issues are you running into saving the JTW on a worker KV?

EDIT: If you’re writing to the same key, then you might have a race condition - but you also can only write to a single key once per second.

#5

No, because the JWT must live for at least 20 minutes, and no more than 60 minutes. Therefore, a) is must be cached, b) it must be change at regular intervals.

#6

Dammit.

#7

The service JWT requirements are that it a) must be used for at least 20 minutes (you can’t create a new token for each request), and must expire within 60 minutes.

The race occurs when an instance of my Worker detects that a token is up for renewal. If it initiates the renewal process, other instances of my worker at the same colo, and at other colos have no way of knowing that the token renewal process has begun. As a result, they’ll also detect that the token needs renewed, and begin the process. At best, I’ll end up with several tokens. Unfortunately, the service rejects multiple valid tokens by the same keyid. :frowning:

ergo the race is within the Workers detecting the token is up for renewal and having only one instance of token renewal kick off.

Thanks,

Jason.

#8

It would be nice, if along side Workers KV, CF also provided a small, synchronous data store for cryptographic keys and tokens. Renewal of stored items within this virtual keystore could be automated by a user-defined script. So, a token-renewal Worker script.

Thanks,

Jason.

#9

Ah, I see the use-case now, yeah it’s impossible to do certain things at this stage, but it seems cloudflare is working on a “secrets” database (found in hidden docs) and also a queue system and atomic counting would be nice. I took this up here. So I guess they’re aware already.

I think a simple and quick solution would be to make it possible to “lock” a specific key from being written to when it received the first renewal. I guess this could also be hacked together somewhat if you use a unix timestamp in the KV key, but it would still be inconsistent.

#10

Well, I’d say the atomic counting goes against the whole REST concept, but I’m on board with the secrets database. Especially if we could launch custom scripts from it for secret renewal…

#11

I guess, just a centralized secret store that’s writable by API is enough for your use-case?

Can’t you cache the JWT in a KV key and renew it via CF API (non-worker)?
Or do it have to be renewed by a worker?

#12

Sure, I can cache the JWT in KV, but I’m trying to avoid a) maintaining another box, and b) ensuring said box is up 24/7. Nevermind security of it, updates, etc. So, I’d prefer to have a feature that allows me to do this with only the Worker.

#13

Hm, maybe setup a AWS S3 storage only for this and use a simple fetch request from the worker to get it? Then you don’t need to maintain pretty much anything and you’ll keep it secure if you encrypted the token before storing it on S3.

Plus, you can use S3 for other things like storing images or other “secrets”.
At least until CF have it figured out.

Here’s a digital ocean spaces sample you can use if you can’t find a working S3 example.

#14

That seems like a good idea. And maybe create a .lock file when in process. If that file exists, workers know it’s already getting renewed.

#15

S3 and lock files help reduce the chances of a race, but it’s still racy. Even on a local filesystem, lock files are racy. better to have a mutex-locked variable within a single process, or, use an atomic (cpu instruction) modifier on a state variable.

Basically, roll my own server. :-/

#16

So my understanding of Workers is deepening. I’ve spent some time doing distributed computing (no control node), and I think I see what’s missing.

Iff CF’s vision for Workers is to easily deploy a global network by “just writing the code”, then there needs to be some form of decentralized synchronization mechanism. The most useful, and simplest that I’ve seen/used are Lamport Clocks.

Unfortunately, in order to consider that, two changes need to be made to Workers. You don’t need both, but each change could be independently useful.

  1. Persistent processes: permits storing state in memory
  2. Persistent storage / per process: permits storing state to ‘disk’

Either one of these would enable Workers to deconflict, change state once (token refresh for me :slight_smile: ), order events, etc. The size wouldn’t need to be large, and since it would be per-process, CF deliberately wouldn’t synchronize this backing store across PoPs.

Note that “per process” as I write above may not be technically accurate. I’m not familiar with the guts of CF infra. Basically, the backing storage for #2 would be a file in the VM (xen/kvm/etc type) filesystem.

The complication is that if there are multiple instances of a Worker per ‘VM’, they would each need their own backing store… Hence suggesting #1. That avoids the issue, at the cost of a constant memory consumption.

On the other hand, if the backing store hold all state for an instance, then an initStore() function within the Worker could be called if handed a blank store. This would enable CF to randomly assign existing backing stores to Workers as needed, and hand in blank ones if the existing ones are all in use.

</rambling>

#17

Honestly I think they need to implement everything you get with Erlang and the beam.

#18

Basically message passing, gen servers and ets give you just about everything you need.

#19

A lock free implementation of this scenario is when every worker Independently calculates the same exp. We need an offset x minutes where 20 > x > 60 and x < 60 / 2 (say 25m). Then each worker can calculate how many 25 minutes is passed from epoch:

let p = Math.floor(Date.now() / (1000 * 60 * 25))

and add another 25 minutes to it:

let exp = (p + 1) * 60 * 25

This way all workers calculate the same expiration although there is a negligible chance for racing every 25 minutes.

#20

I think it will race even if you do this, you’d have to test it on global scale to really know though.

Seems to me that the renewal is what cause a race condition - so - what if we use the unix-time as the key for renewal since each worker can predict the exact time when a new key should be available.

So the new JWT would already exist when the workers need it.