Analytics Engine data structure

For Workes & Pages, what is the name of the domain?

cde.holidayextras.com

What is the issue or error you’re encountering

Too easy too mismatch columns

What are the steps to reproduce the issue?

The analytics engine stores data in a very disjointed way and it only accepts numerical values.

With a maximum of 20 labels per metric, each label name is stored as a string value in it’s own database column arbitrarily named blob1, blob2, … all the way to blob20. The label names are stored in different columns, each arbitrarily named double1, … up to double20.

Given the following object of example data:

{
  "labels": {
    "foo": "bar",
    "biz": "baz",
    "qux": "quux"
  },
  "samplingIndex": "abc123"
}

I need to convert this into the format accepted by the writeDataPoint method.

However, as you can from example data above, the values are not numerical.

It also raises a potentially serious issue in terms of maintenance and scalability. For example, if all values are numbers then they should be stored, but what happens to the data if at a later date a dev adds a new label key-value pair to the beginning or middle of the object? Suddenly the label name at blob1 is now completely different. Pull requests that include changes to metrics will need a consistently higher level of scrutiny.

dataSet.writeDataPoint({
  blobs: [...Object.keys(data.labels)],
  doubles: [...Object.values(data.labels)],
  indexes: [data.samplingIndex],
})

Why can’t Cloudflare just store the data as a JSON blob? I understand it would need to be validated to only be one level deep and probably still impose a size limit, but there are other types of metrics data (other than numbers) that need to be tracked. A JSON blob would better ensure the key-value pairs are matched correctly.

In meantime, can I have suggestions on how I might track such data?

This is how I handle analytics both internally and externally: blob-builds/worker/src/analytics/analytics.ts at main · WalshyDev/blob-builds · GitHub

We maintain one interface which we add to and then just use the field like normal in write

It let’s us keep a very consistent, readable data interface throughout the app and still write in a compliant way.

Simple answer: that just isn’t anywhere near as performant, especially at a large scale like CF where we may be doing millions of writes a second (see: HTTP Analytics for 6M requests per second using ClickHouse)

Okay, that has inspired me to create an extensible class, one per metric and each with different strongly-typed interface for label values. That gets around the maintenance and scalability issue.

However, we still need to track non-numerical values. Booleans I can get around by casting them to 0/1, but what about strings? All values will always be a primitive type.