Dynamically create d1 instance

olivier.duhart · May 28, 2023, 7:56pm

Hello,

I am wondering if possible to dynamically create d1 databases and bind them to a worker or page fonction.
My typical use case would be to create a d1 base for each user of a saas product, which will natively enable multitenancy and allow great data isolation.

Is there some API for this use case or would it be not recommanded at all ?

Tanks for your answers

Olivier

tobny · December 6, 2023, 8:27pm

Was thinking about the same, would be a really cool setup

Creation could be solved via a REST call: https://developers.cloudflare.com/api/operations/cloudflare-d1-create-database

However I don’t see a proper way of binding D1s dynamically – beside of starting a new worker for each user, which would not feel right

Ideally it would be something like a dedicated pool of D1s. To be able to do something like env.DB_POOL.get(userId).prepare("SELECT * FROM …

More fancy stuff, like migrations for the whole pool would be the next level.

Thanks & Cheers,
Florian

olivier.duhart · December 7, 2023, 9:27am

I’ve got an answer through cloudflare developer’s discord.
the way to go is through the API : https://developers.cloudflare.com/api/operations/pages-project-update-project

with a PATCH request and a payload similar to

curl --request PATCH \
  --url https://api.cloudflare.com/client/v4/accounts/:account_identifier/pages/projects/:project_name \
  --header 'Content-Type: application/json' \
  --header 'Authorization: <api-token>' \
  --data '{
  "deployment_configs": {
   "production": {
     "d1_databases": {
       "D1_BINDING": {
          "id": "445e2955-951a-43f8-a35b-a4d0c8138f63"
        }
      }
    }
  }

I’ve been working on a poc to know how if it could work, but pause this project for a while now.
Later, I am hoping that given the D base name is ‘d1Name’ I could be able to access the binding throught some js code like :

var d1Fromcontext = context.env[d1Name]
 await d1Fromcontext.prepare(
            'INSERT INTO calendars (groupe,team, type, calendar) VALUES (?1, ?2, ?3, ?4)'
        )
                .bind('group', 'team', 42, 'ics')
                .run()

I don’t remember if it worked (at least D1 creation is ok) . and as this can not be tested locally I had hard time working it out.

tobny · December 7, 2023, 12:33pm

The PATCHing looks to me like it fiddles with the global config of the worker? So all of them would be affected? Even if this works some how, it would not feel right.

@celso Do you know anything? If this is a scenario that is already supported in some ways? Or will be in the future?

olivier.duhart · December 15, 2023, 10:46am

I 've managed to get something working.
there is only one pages project that bind to has many D1 that have been dynamically provisionned.
you can find code here : cf-d1-provisioning.

It’s not perfect, error management is awful and some edge cases are surely not managed.

but the happy path is working. you can ask for a D1 instance creation and then add create table and data.
I made it alive on a free plan (so limited to 10 D1 instances). You can find it here : cf-d1-provisioning PoC

I insist this is only a proof of concept

tobny · December 15, 2023, 12:11pm

Wow! Amazing stuff @olivier.duhart!

So with a setup like this, to solve concurrency, I would bind the tenants D1 at their login and unbind when they logout(?) Each session/login just gets access to “their” D1 via mapping.

I also like a side aspect your code shows, that it is possible to execute SQL without even binding the D1, via a REST call! I wasn’t aware of this feature. This could simplify things To use D1 but ignore the binding.
Question would be performance or if there are other drawbacks. Do you know any? Or if there actually is a performance penalty? Or is the binding actually just the same process, but hidden?

Thanks a lot for your cool work and sharing!

olivier.duhart · December 15, 2023, 1:31pm

I was not thinking about using a D1 for each session, rather a dynamic D1 for each customer (think a accounting folder or CRM folder) Then each customer would have its own “namespace”. And debugging a folder would be a breaze since you can download the D1 backup on you dev computer.

Your use case is interesting though.

As for performance there is a cost when creating , deleting, binding , unbinding D1 instance , especially for creation. So using at a session level may slow down the login process.
For my use case, I don’t see any particular drawback compared to a statically created and bound D1.

The binding is the same as the one done through the CF dashboard so there should be no difference

tobny · December 15, 2023, 1:56pm

Ok, I get your use case, makes sense!

Just to make sure I get it right, in an extreme case, where 100 concurrent users would use your app, each in a different CRM Folder, this would mean all 100 D1s would be bound to every single worker called in that time span (but each accessing only the one D1 mapped by their tenant name)?

And yes, the data separation is really the cool thing!

I guess my binding-less approach would make more sense to less data and request intense use cases. I think it would also miss out on things like prepared statements

olivier.duhart · December 20, 2023, 7:44am

hello, you get it. a D1 instance for each customer. The management of access to instance D1 based on the user is the responsibility of the user of this PoC. There is no automation.

I still struggle with a problem. When creating a D1 instance, the D1 appears bound to the worker in the cloudflare dashboard but we can not access it through env["TENANT"]. The only way i’ve found to make it accessible is to redeploy the worker ! Obvisouly that’s not really cool as it means a down time for all users (even though it is quite fast).
I will explore this issue in 3 ways :

automate the redeploy (which means downtime affecting all existing tenants)
more complicated : provision a new worker (and D1) for each tenant which would not disturb existing tenants.
maybe a way to refresh the work env exists…

tobny · December 20, 2023, 10:42am

Is there actually a downtime when redeploying? I would expect it is a zero downtime process?

Just the clients that triggers the D1 creation need to do a bit of polling to see when it hits a new worker process that can access the newly created D1.

One additional thing that popped in my mind, but I doubt it, if the new D1 instances are already available on a worker process restart (I mean the normal thing that happens when the worker isn’t used for a while and it gets killed and restarted with the next hit)? I also don’t know if it is possible to force such a restart.

timkieu8288 · June 21, 2024, 3:41am

Hello. I am also trying to solve this multi-tenancy D1.
One option that we can create in advanced a list of free_available D1 DBs before they are injected into new customers sign-up flows and mapped. A fixed maintaining schedule can be applied to update and add more D1 if the free D1 pool is full.
Another option is using all D1 via its query API for any tenant Cloudflare API Documentation

timkieu8288 · June 21, 2024, 3:50am

However, the D1 APi query is not good if we use ORM like Drizzle.
Thus, I will focus on creating a free waiting D1 instances and map their namespace into ENV array. But this way we maynot use up-to-max- 5,000 or 50,000 D1 DBS of 1 paid CF account.
CF D1 team should add multi-tenacy feature for us ASAP!