This is quite an open-ended question, but I’m looking to store time-series data within Workers KV.
Now, I realise that KV is adapted for a high amount of reads, but not all at once.
One possible solution that has been suggested to me is the use of entry metadata — this means that when I call /list on the KV namespace all this metadata is returned to me at once.
Now I’m wondering, would it be more performant to use metadata to fetch the data at once, or iterate through the keys one by one. Or perhaps another method is better. Thoughts?
You’d probably want to store the time as the key and then whatever you need in value. Then list you can go through them in timed order. Performance wise? Well… you won’t be reading the same values so you’ll be doing lots of cold reads which are not quick. Here are some average times:
Location
Average cold read (ms)
London, United Kingdom
65.59
Frankfurt, Germany
59.386
Los Angeles, United States
107.671
New York, United States
83.297
Toronto, Canada
76.365
Sydney, Australia
213.578
São Paulo, Brazil
94.582
Tokyo, Japan
177.254
Hong Kong (GCP asia-east2)
251.469
Russia (Yandex ru-central1-b)
126.571
(blog soon:tm:)
I’m not sure how metadata would really be used here, since the key is the main thing there for list ordering. Makes sense to use the time as key and value as your data.
If my preliminary design is correct, it’s actually rather expensive to use KV and the Workers platform for storing time-series data (like IoT or servers monitoring).
If you submit “one metric” per KV “each minute” (and keep the data for a year in the KV), this would incur the following costs “per month”: 43800 submits (each minute in a month) / 1.000.000 * $5.00 = $0.219 for KV write requests “per month”. That’s just for one metric for one server. You could combine a few metrics (like all CPU info like “user”, “system”, “idle”, etc.) into one KV but still this will be $0.219 for one service on a monthly basis.
As an example, our Nagios server (which runs on a single machine with 2x SSD disks) does checks & records the results in RRD databases for 40.000 services with a few metrics “each minute”. This would cost $8760.00 per month only for the KV writes… Additionally, you also need to include the costs for Workers, and maybe for some Durable Objects, depending on your design. You also would need to query (read) the datastore frequently – if you want to monitor the services, as well as to display charts.