Global, decentralized leader selection with Workers KV, was: Fetch() with a client certificate?

#21

The typical solution in distributed computing is to have a leader election. Basically, at exp - (40 * 60) all Workers detecting the pending expiry register a unique ID in the KV store. (This requires listing a subset of keys, which Workers KV api can’t do yet). e.g.

/* NOTE: Horrible pseudo-code :) */
value = /* microsecond-level current timestamp */
hash = HASH(rayid + value)
key = "election/" + token_exp_ts + "/" + hash
KV_STORE.put(key, value, /* expire in 1 minute */)

/* wait for KV to propagate */
sleep(11)

/* grab the list of registered candidates */
var candidates = KV_STORE.keys("election/" + token_exp_ts + "/*")

/* sort the list on their recorded timestamps  (KV_STORE values) */

/* if there's a tie for first, break the tie by alphabetical sorting of hash (KV_STORE key) */

/* If I'm the winner, go renew the token */

Each worker sorts the list on timestamps, breaking a tie for first by selecting the first alphabetical guid (hash of rayid + timestamp). The winner is the “chosen one”. That worker then proceeds to renew the JWT.

Note:

  1. This depends on CF:
    a. guaranteeing global sync of KV within 10 seconds
    b. providing a KV API call to list keys or a subset thereof
    c. permitting Worker scripts to live for at least 12 seconds wall clock
    d. CF maintains fairly accurate clock sync across it’s network (NTP-level accuracy)
  2. No security guarantee, malicious Worker could write a fake timestamp to almost guarantee selection
  3. remote possibility that HASH(rayid + timestamp) might collide. Could be mitigated by including the x-real-ip in the content of the hash…

[edit]: s/25/40/g, removed note about Workers presuming non-candidacy after a certain window. That note presumes a large amount of traffic. This needs to work even with one connect per week (or, at least one connect per time greater than the JWT validity window)

#22

That works already, 15s as far as I remember, but was supposed to be increased.

#23

Yes, and I can probably (ick) call the KV web api to list the keys until such time as CF adds that call to the Workers KV api.

#24

Also from within the Worker itself, that should work…