Downloads counter

We run a game mods CDN that pushes tens of thousands of different files daily, and up to 150TB of bandwidth monthly. Cloudflare caches anywhere from 85-90% of this traffic (which is awesome), but I’m looking to implement a downloads tracker via Cloudflare Workers so we can get some specific statistics on how frequently files are accessed, and how many times each file is downloaded.

Because we get traffic from all over the world, I don’t think Workers KV will work well for us, as we’ll be wanting to increment a download counter pretty frequently, and the eventual consistency model will almost definitely leave us with some traffic being “lost” as we have to read, increment, and then write back to Workers KV. We’d essentially need globally atomic operations, which Workers KV doesn’t seem to be suited for at this time.

So, I thought we could just make use of caching download numbers/times in global memory, and then send a request back to our server for database storage. However, due to the unpredictability of how some of these files are accessed (some constantly, and others only once or twice per day), I was wondering if there was any kind of event that a worker emits just before it’s about to shut down? This would give us the ability to cache download counts for as long as we could in memory, and then make a request to our server just before the worker completely exits. We could of course just make the request to our server every 10, 100, or 1000 requests, etc. but we’re trying to reduce the load on our origin as much as possible, as well as reduce the possibility of any traffic being “lost”. Does anyone have any suggestions for how best to approach this?

I was just on my way to look at Logflare when I saw this message…

Would something that can filter through your Cloudflare logs for these downloads help? I believe Logflare can do some filtering. I think the topic has come up recently, and I’m just now headed over to explore it.

https://logflare.app/

Logflare looks pretty neat, but all we really care about are 200s for each file, and cache hit/miss (if possible), to then report back to our server for storage and time-series representation. We do anywhere from 1-3M requests per day, so coalescing this data is definitely going to be a requirement. Caching the download counters in a Worker for its lifetime seems like a good solution, if we can guarantee the data is reported to our server before the worker closes.

Using a mix of global variables, and KV you could implement several counters.
Note that KV is not very slow writing.
Precision might be rough.

I’m still confused. Is it kinda related to 192.168.8.1? Can you share more about it?

do you really so care about data loss? with 150tb of traffic how much difference would it make if you would lose 100-1000 requests per month?
and second question, do you need something for the long run or for a small period of time?

for small period and without caring so much about small data loss I would just log all the requests to 3rd service like logdna, let them export the logs into s3, and than depends on your needs move the data into more appropriate place like elasticsearch or whatever

also worth looking at Amazon Athena or Amazon Kinesis

1 Like

@cherryjimbo if you still want to do this I can help. Logflare can easily handle millions per day, and you can cheaply query the status code in BigQuery. Then just create a job on your end to do that query however frequently you want to update the data in a cache somewhere closer to your database.