Hello everyone! I’m on the Workers team and we’re trying to gather some feedback on the future of storage in Workers. We’re considering letting Workers access an external database… would you be interested in this? Here are some questions to get you started… we want to hear all of your ideas!
What would you want to build with it?
What databases should we support?
Do you want us to cache queries, how would you want to tune it?
Honestly, I think the eventually-consistent KV is suitable for many usecases. However, a similar product, but immediately consistent, which was just for small info, integers and such, might be useful. I’m thinking along the lines of treating each instance of a Worker like a thread, and having some sort of mutex / cmpexchg mechanism. If the resource is a) small, and b) not disk-backed, it might be doable.
Some things that this theoretical db could enable are:
reference counting
mutual exclusion / locking
global stats counters
global state machine latching
As for what one could do with this capability? Well, I don’t like spouting buzzwords, but a blockchain’s state could be incremented and tracked globally without a centralized server.
There’s a lot of other fun decentralized systems problems that could be solved and deployed if that tool existed…
Personally, most NoSQL databases I’d use are exposed over HTTPS, to which workers are already well suited. I’d rather see iteration on Workers KV to get closer to something like DynamoDB, with LevelDB-style prefix queries or mutation-based worker triggers.
+1, already can expose whatever we want from our database through api, and will have to find a way to handle scaling, a-z availablity etc so it hard for me to imagine what problem it will solve that we dont already have(direct access to db)
The idea is interesting but I the implementation would likely be a pain and nowhere near as performant as Cloudflare Workers are. So I’m not interested.
I’d rather see improvements to Workers KV so that those can behave more like a database.
I agree with the general sentiment, improve KV’s instead.
@jason28 listed some really needed cases like global stats counters, locking etc.
I’d also like to see it possible to do time-series logging directly inside workers writing directly to KV.
Because logging in general is expensive and hard, doing it in a worker makes it even harder and unpredictable (having to rely on timeouts). Having it in workers makes it easy, fast and predictable.
So, I’ve only just come across this while looking at the feasibility of moving our API to Workers.
While the perfect solution would be to utilise a cloud provided database that was accessible via HTTPS requests, that’s not cost effective for us right now.
We’ve got an existing MySQL cluster we’d love to be able to talk to from workers for pulling information we don’t want to store in KV. At present, we use Sequelize to talk between our Node app and our database.
My main apprehension on this is ensuring we can perform the call to the database, retrieve the results, and send them off without exceeding the CPU limit. As I’m new to serverless computing, I’m unsure if this is a real worry or just me struggling to truly wrap my head around it!
I suppose there’s a few questions on this:
Would this even be remotely feasible if the Workers team decide (or have already decided) to not implement such features?
Is this the correct way to go about migrating our app? Or are we clobbering serverless and traditional computing together?
Would it be possible to cache results based on URL and a unique key / JWT in a header?
I believe the CPU limit shouldn’t be a problem. It only counts against the time you are actually running code, not while you are waiting on a database request, so I think you would be all good there.
Yes!
I think it’s reasonable to think that not everyone can migrate everything all at once. One of the nice things about Cloudflare’s history as a proxy is it’s easy to do some of your API’s actions in a Worker, and then pass through anything you haven’t migrated yet.
Yes! You have complete access to the Cache API to cache anything you like.
+1 for focus on making KV better + adding more features. The feature i’d love to see most in KV is the ability to run distributed transactions with KV.
We’ve progressed quite a bit since the question was originally posed, and I just wanted to add a new though based on our experience deploying Workers to production.
It would be a really nice feature if we could specify / write some sort of KV object write algo. e.g., right now, write order is maintained globally with the simplistic, but dependable, “overwrite previous with mine” rule.
It would be great if we could specify a write algo similar to how you can specify different git merge resolution algos. I think this would be really useful for scenarios where you’re adding to a list in a single object. The first write deleted entry5, while the second write added entry8 and entry9. Currently, when the two writes occur at different colos within 10 seconds, the end result of the list will include entry5. Whereas with a different algo, we might be able to get the desired result (added 8 and 9, delete 5). Yes, this is sounding a lot like git, and would require knowing which version of the object an edit was against. But damn, it would be really useful.
I know this is an old thread, but just wanted to add a new thought. It would be nice if either a Worker or it’s WASM component could access a Postgres database that’s behind Argo (Cloudflared).
Hey all, wanted to give a brief update on what we’ve been working on and how you can participate.
Our team is continuing to heavily invest in making KV better, faster, and adding new features. But as some of you have pointed out, there is also a need to access your existing database from Workers as well. That’s why we’ve done some experimentation and are ready for some of you test out a new feature! Soon Cloudflare Tunnel will allow you to connect your SQL database to Workers. You’ll import a special library into Worker and be able to query rows and execute commands. If you’re interesting in building something with this, fill out this form and we’ll get back to you!
Really excited to see what you all can build with SQL and Workers!
PostgreSQL as this is the DB used for GitLab installs, whether Enterprise or Community. For self hosted Git instances, GitLab is unbeatable, imo, and their reference guides, beyond ofc https://git-scm.com itself, Develop with GitLab | GitLab is unbeatable, again imo. Now, the one drawback of the GitLab Omnibus package is it’s relatively high minimum memory requirement, 8GB. That’s fine for blade servers with 128GB RAM, e.g., but for many who do use GitLab’s self-hosted option, that minimum places a strain on their, e.g. 32GB workstations as that 8GB min. can and does spike to reach 80/90% RAM usage. Enter Workers, which would allow a serverless form of Dynamic Disk Caching, allowing the offloading of RAM memory into a Cloudflare Cache, configured for / triggered by one’s specific needs. Thoughts?
KVSTORE.list({"prefix": "...." }) goes a long way towards alleviating this problem. You just store data in separate keys under the same prefix. This removes the overwrite problem by just writing to different keys, and subsequently listing a subset of them.
I’d eventually like to "glob" and a limited "regex" but it’s not critical atm.
db-connect was released in the latest cloudflared update. Follow the quickstart in the link and you’ll be able to deploy a Worker that can talk to a SQL database via an Cloudflare Tunnel .
For now, this is an experimental feature, but we hope that you’ll try it out and give us feedback on what you want to be added or improved. We encourage you to post here on what use-case db-connect can unlock for you.