Recording URL visits using workers

I am trying to optimize an issue related to “most read” articles on a website. Initially, I used to programmatically increment the visits count every time an article was visited. Now, I am considering doing all this on the Edge to reduce the number of database requests on the Origin.

An example of the scenario I have in mind would be as follows:

  1. Creating a script on the Edge (using Workers) that saves the number of visits for each URL.

  2. Every X minutes, I want to call a service on the Origin and send the saved/recorded visits and then clear the counts.

  3. On the Origin, perform a batch update and increment all articles’ visit count in one shot.

Looking forward to hearing your suggestions. Any better way of doing this?

I think the devil is in the details; how are you going to save the number of visits? How are you going to make a call “every x minutes”?

1 Like

Well, at first, I thought of handling those two challenges on the Origin:

How are you going to save the number of visits?

By saving URL segment and number of visits in a file on the server. Something like:

URL     |   visits
URL1    |   1
URL2    |   5
URL3    |   20

How are you going to make a call “every x minutes”

I thought of a cronjob that calls a service or handler on the Origin. This service/handler reads the data from the file, updates the database in a single shot, and clears the file.

Then I thought if I can do all this on the Edge using Workers.

Ah okay, so you’re asking for advice about how to do this? I thought you had a plan and were asking how good of a plan it was. No worries :slight_smile:

Workers do not have the ability to save files, and there are no cron jobs, so you can’t copy that plan exactly. That’s why I was curious how you’d do it!

In the end, I am not sure that you can do this with Workers, though maybe someone else has ideas for you :slight_smile:

1 Like

As there is no KV increment function (please, please, @sklabnik, make increment function; it should also be without 1 write per second limit), I am now planning to implement something similar to recording URL visits like this:

  1. Have a namespace dedicated to visit records
  2. On any visit, put a record there with key:ray-id and value: visited url
  3. Make a cron that send a request regularly (every few minutes? I wonder, if possible to call every minute without a risk of conflicts) to a separate cron worker
  4. Cron worker would read the namespace contents, make in-memory array that is keyed by visited URLs that contain the number of visits, and then add the values from that array to the number of visits for each URL (I store all my data in KV)
  5. Cron worker would then batch-delete all KV records that it has processed

I have no idea, if this solution will work and if it is scalable (maybe will need some sharding?) and how many visits per minute would that be able to process. I am planning to implement it soon - then will see…

1 Like

I like this solution. If you are interested in open source we can create this as a library so everyone can plug it into their solution (A library on Github for example).

In my case, I am implementing this on a blog. I often get a large number of visits at the same second for the same URL (due to push notifications). If I wanted to increment the article count on every hit, I’ll end up with a connection timeout error. So, the idea of saving the IDs and creating a batch update would be really effective.

We’ll have something eventually, it’s just very non-trivial! Thanks for mentioning me though. Adding this to the pile of requests…

3 Likes

Hey Rami! You can use Logflare to log all request metadata to a BigQuery table and then just query that periodically and cache those results in your local database.