KV - External backups?

Are there any plans for official backups features, like download a complete KV namespace?

Or is this something we need to build ourselves?

I know we can use the API to list and dump KV, but there’s a lot of work needed surrounding that for it to be reliable as an automated backup - not to mention costs.

I’ll second that, with an additional feature request: A snapshot cookie. So that the complete download is of a consistent view of the KV store at an instant in time. Similar to how ZFS’ snapshot feature works.

That would be a dream-scenario.

It basically means storing KV versions, which could also enable app persistent history/time machine.

I suspect they’re already doing something similar. As they claim that writes are committed in-order. Which means, at the least, there is a timestamp associated with each write. Hypothetically, it would work similar to the cursor in the web API for long lists (listing all keys, for example). The cursor most likely contains encoded information for how much to send and what point to start at. As this preserves the “REST-i-ness” of the API.

A similar ‘cursor’ for the snapshot could basically encode “don’t send anything written after timestamp X”. So, I don’t think it’s that far out there, but I also don’t think it’s “deliver it in a week” either.

The hardest part of it is “this value is too new, so I need to send the previous version”… To answer that depends highly on what’s under the hood.

1 Like

Yeah, I see it being plausible to implement for sure, since this is common in other KV-databases.

The most important thing for me right now I think is addressing deletes, because they cannot be permanent. And also that we get the batch-retrive-keys feature soon, this way I could write a temporary backup solution and maybe it’s better anyway, since then we can use prefixes.

Did you not see this?

What if Workers could access your SQL/NoSQL database?

KVSTORE.list({"prefix": "...." }) goes a long way towards alleviating this problem. You just store data in separate keys under the same prefix. This removes the overwrite problem by just writing to different keys, and subsequently listing a subset of them.

I’d eventually like to "glob" and a limited "regex" but it’s not critical atm.

(sorry for linking to my own mention. I can’t seem to find the original announcement…)

I’ve seen it but don’t consider it a solution, it exponentially increase write and list usage just to keep an index of which is the “current” version of keys.

No, KVSTORE.list({"prefix": "...."}) is a recent addition to the workers KV api. You can list a subset of your keys with one call, IOW, your “batch-retrieve-keys” feature. We’re using it in production, it works great so far.

If that’s not it, what does your “batch-retrieve-keys” feature refer to?

This feature is on our radar, though I cannot promise when it will be implemented.

“backup” can also mean a few different things in this context; one thing you can do today is “back up” a namespace into another one. We shipped a tool for this: https://github.com/cloudflare/kv-worker-migrate/

However, what you’re talking about is another (and I’d argue, more important) form of backing up, which is exfiltrating your data. This is a very real need, and it’s hard to build it yourself today, since we don’t support bulk reads yet.

I’ve used Heroku’s hosted postgres backups before, they’re quite useful and I imagine when we pursue this feature it will look something vaguely like it. Can’t make any guarantees for now though!

2 Likes

@jason28: I mean the keys contents too of course, which doesn’t seem to be available in bulk.

@sklabnik: Thanks for the very vague confirmation, at least it’s on the roadmap :wink:

No problem :slight_smile: Thanks for putting up with my vagueness; I don’t want anyone to get too excited and assume that something may be coming soon when it also may not be.

This is also on the roadmap too, and I imagine far before some sort of explicit backup feature. The ability to use it to build your own backup feature is one of the good use-cases for this feature, and it’s significantly simpler than a full backup solution. It also has other use-cases as well.

1 Like

The trouble I see is object encoding. When you’re requesting one object, it’s easy. When you have multiple objects, say 3 JSON strings, 14 binaries (image files and such), and 10 string values, that becomes a bit non-trivial to boil down into one response…

The simplest answer maybe to base64 encode each object regardless of original type, and stuff each into a single JSON response.

1 Like

how about a commit to GIT (github/bitbucket) on every new PUT or DELETE?

It would be easy to build this myself, but not to destroy the GIT service, it would be nice to commit every hour of all mutations in a namespace?

This will create an incremental backup system… Right?