I’m trying to better understand how sampling of Workers analytics work.
I’ve read [the docs](Workers Analytics Engine SQL API · Cloudflare Analytics docs], and I understand that at high rates not all events will be stored for a given index key.
I have a few questions:
- What is considered “high rate”?
- What happens to all the
blobs
anddoubles
when data is sampled? How can they be used? - How reliable is the
_sample_interval
? Let’s say I want to use analytics for metered billing. I charge my customers $x for every 1000 events with specificblobs
which I use as “event dimensions”. Will that provide accurate billing? - The docs say “Sampling is based on the index of your dataset so that only indexes that receive large numbers of events will be sampled”. Is this the only thing that the index field is used for? Can I run queries based on
blobs
and aggregate data across indices? - I use one of the
blob
fields to store a reference id, which is basically a unique identifier that can be used to later troubleshoot and connect that event to some data in the database. I might rarely query by it, but mostly just present it as discrete value so events can be viewed as “activity log” rather than aggregated data. Is this a good practice?