I’m trying to better understand how sampling of Workers analytics work.
I’ve read [the docs](Workers Analytics Engine SQL API · Cloudflare Analytics docs], and I understand that at high rates not all events will be stored for a given index key.
I have a few questions:
- What is considered “high rate”?
- What happens to all the
doubleswhen data is sampled? How can they be used?
- How reliable is the
_sample_interval? Let’s say I want to use analytics for metered billing. I charge my customers $x for every 1000 events with specific
blobswhich I use as “event dimensions”. Will that provide accurate billing?
- The docs say “Sampling is based on the index of your dataset so that only indexes that receive large numbers of events will be sampled”. Is this the only thing that the index field is used for? Can I run queries based on
blobsand aggregate data across indices?
- I use one of the
blobfields to store a reference id, which is basically a unique identifier that can be used to later troubleshoot and connect that event to some data in the database. I might rarely query by it, but mostly just present it as discrete value so events can be viewed as “activity log” rather than aggregated data. Is this a good practice?