Are there any plans for official backups features, like download a complete KV namespace?
Or is this something we need to build ourselves?
I know we can use the API to list and dump KV, but there’s a lot of work needed surrounding that for it to be reliable as an automated backup - not to mention costs.
I’ll second that, with an additional feature request: A snapshot cookie. So that the complete download is of a consistent view of the KV store at an instant in time. Similar to how ZFS’ snapshot feature works.
I suspect they’re already doing something similar. As they claim that writes are committed in-order. Which means, at the least, there is a timestamp associated with each write. Hypothetically, it would work similar to the cursor in the web API for long lists (listing all keys, for example). The cursor most likely contains encoded information for how much to send and what point to start at. As this preserves the “REST-i-ness” of the API.
A similar ‘cursor’ for the snapshot could basically encode “don’t send anything written after timestamp X”. So, I don’t think it’s that far out there, but I also don’t think it’s “deliver it in a week” either.
The hardest part of it is “this value is too new, so I need to send the previous version”… To answer that depends highly on what’s under the hood.
Yeah, I see it being plausible to implement for sure, since this is common in other KV-databases.
The most important thing for me right now I think is addressing deletes, because they cannot be permanent. And also that we get the batch-retrive-keys feature soon, this way I could write a temporary backup solution and maybe it’s better anyway, since then we can use prefixes.
KVSTORE.list({"prefix": "...." }) goes a long way towards alleviating this problem. You just store data in separate keys under the same prefix. This removes the overwrite problem by just writing to different keys, and subsequently listing a subset of them.
I’d eventually like to "glob" and a limited "regex" but it’s not critical atm.
(sorry for linking to my own mention. I can’t seem to find the original announcement…)
I’ve seen it but don’t consider it a solution, it exponentially increase write and list usage just to keep an index of which is the “current” version of keys.
No, KVSTORE.list({"prefix": "...."}) is a recent addition to the workers KV api. You can list a subset of your keys with one call, IOW, your “batch-retrieve-keys” feature. We’re using it in production, it works great so far.
If that’s not it, what does your “batch-retrieve-keys” feature refer to?
This feature is on our radar, though I cannot promise when it will be implemented.
“backup” can also mean a few different things in this context; one thing you can do today is “back up” a namespace into another one. We shipped a tool for this: GitHub - cloudflare/kv-worker-migrate
However, what you’re talking about is another (and I’d argue, more important) form of backing up, which is exfiltrating your data. This is a very real need, and it’s hard to build it yourself today, since we don’t support bulk reads yet.
I’ve used Heroku’s hosted postgres backups before, they’re quite useful and I imagine when we pursue this feature it will look something vaguely like it. Can’t make any guarantees for now though!
No problem Thanks for putting up with my vagueness; I don’t want anyone to get too excited and assume that something may be coming soon when it also may not be.
This is also on the roadmap too, and I imagine far before some sort of explicit backup feature. The ability to use it to build your own backup feature is one of the good use-cases for this feature, and it’s significantly simpler than a full backup solution. It also has other use-cases as well.
The trouble I see is object encoding. When you’re requesting one object, it’s easy. When you have multiple objects, say 3 JSON strings, 14 binaries (image files and such), and 10 string values, that becomes a bit non-trivial to boil down into one response…
The simplest answer maybe to base64 encode each object regardless of original type, and stuff each into a single JSON response.
Would someone mind clarifying for me the difference between the two meanings of “backup” described here:
This feature is on our radar, though I cannot promise when it will be implemented.
“backup” can also mean a few different things in this context; one thing you can do today is “back up” a namespace into another one. We shipped a tool for this: https://github.com/cloudflare/kv-worker-migrate/
However, what you’re talking about is another (and I’d argue, more important) form of backing up, which is exfiltrating your data.
It seems like backing up into another KV namespace is some kind of “full backup” specifically into another KV store. Does “exfiltrating your data” refer to a different type of backup in terms of what/how/when things are backed up, or is it about putting data into another type of store (e.g. postgres)? Or something else?
The difference lies in the actual data redundancy. The first, when you copy to another namespace, makes you redundant to accidental changes/deletions. The second makes you redundant on losses from the Cloudflare infrastructure, if their server were to fail, you’d have a copy of the data somewhere else. This second one can also make you redundant to accidental issues as the first kind. I argue as well that this second one is the more important.
It looks like there’s enough API now for you to pull KV backups. https://github.com/xtuc/kv-backup shows how to achieve this and it seems easy to port to other languages beyond Python.
An interesting way to do this would be to set up a cron trigger https://developers.cloudflare.com/workers/platform/triggers/cron-triggers/ and a worker that’s pulling KV data and writing it somewhere. In this process, it might be smart to md5 the snapshot so you can tell if you’ve captured it already in case you want to save separate revisions and save a bit of space.