D1 latency from worker high

(Cross post from Discord, didn’t receive a reply there.)

Hi all, I’m wondering if someone can help me debug why my D1 latency is so high, or how to improve it?

Context: I’m experimenting with a very small D1 table (25 rows), and I’m doing a SELECT on an indexed unique column.

I’m seeing up to 500 - 1500ms latency doing a read from a Worker, but the console is telling me the query latency is always < 1ms. If the console is telling the truth, then the query is fast, but the network is slow.

My query latencies seem to be bimodal:

  • if I’ve made a request in the last ~minute, my observed latency is 60-120ms (lower w/ smart placement turned on)
  • if I haven’t, my first request will be very slow, always above 500ms, often 1300ms - 1500ms, sometimes 1900ms+

The complete code I’m measuring is:

let d1ReadResults = await env.DB.prepare("SELECT * FROM Entries WHERE hash = ?1")
    .bind(hash)
    .first();

I was expecting D1 latency from a worker to be very fast, but right now it’s often slower than my network calls to services outside of Cloudflare! I’d love to improve this if I can.

Related post from 1 year ago:

On the return object, there should be an object of meta and an attribute will be duration (docs). This will tell you how long it took for the D1 query to complete. What is the time there? My guess is that it is pretty low, and it might be your database is farther away which causes long round trip times.

Thank you very much for the reply @Cyb3r-Jak3.

As you suggested, I checked out the meta object on the D1 query to verify it was reporting the same thing as the dashboard. Sure enough, all the queries run in < 1ms. Here’s a handful of requests w/ query time and the total time the request took:

query total
0.5005ms 519ms
0.9635ms 514ms
0.5335ms 79ms
0.2667ms 67ms
0.3291ms 1632ms
0.1922ms 1630ms
0.2747ms 47ms
0.3482ms 58ms

You can see the pattern I described: after some very high latency requests (> 500ms), there are some moderately fast ones (< 80ms). I waited a minute, queried again, and the pattern repeated, with very high latency (>1600ms) followed by lower (< 60ms).

I don’t think it’s possible that the database is far away from the worker, because that would indicate a minimum latency. As in, no requests would be able to complete faster than that round trip time. But instead, we see some requests completing quickly, and others taking a very long time.

1 Like

Hmm, that is strange. Almost seems like your worker is getting evicted from running very quickly. If you are able to then I would open a ticket as I am stumped, but maybe another @MVP would know.

It’s weird, right? Yeah, I’m starting profiling after the worker has already begun execution and ran some simple code (an auth check), so I’m not even measuring how long it takes the worker to start.

I wonder whether there’s something with cloudflare giving the worker a D1 server to talk to. Maybe there’s some initial latency and short (~1m) caching that happens there.

Thank you again for the reply, and I’m glad you suggested opening a ticket, because I didn’t realize I could do that! I will give it a shot when I have bandwidth and report back if I learn anything.

I think this is due to D1 not being distributed unless it receives larger amounts of traffic (Like with KV) - so yes, you’d get very high latency if the request goes from China or more commonly Russia to US.

So are you suggesting that it will stay the same unless there will be more traffic from China/Russia?

I’m saying, Cloudflare optimize local cache based on the amount of requests. If there’s very little requests to data, then it’s taken from low-cost datacenters and the more requests that are made, the closer to the source, the more data will be moved there.

Thank you. Do you have any idea of a metrics threshold when CF decides if it’s time to move the data closer? (req amount/req frequency/load time)
I’m asking because I’m considering moving to CF and was searching for latency metrics. When I googled about KV, I stumbled upon your competitor (I guess) benchmark against KV and results were questionable:

  • P90: 742ms
  • P99: 1,336ms
    for a 30 minute stress test.

Then, after increasing the load and making all request from the same region it got better:

  • P90: 742ms → 115ms
  • P99: 1,336ms ->560ms

But still makes me wonder!

PS: I don’t know if I can post a link to that benchmark here, but you can google it easily with “cloudflare kv latency benchmark”

1 Like