CPU limit is far over the limit but there is no error message

I’m trying fuzzy searching in Cloudflare workers. Fuzzy searching takes time/memory. So i did some tests with Fuse.js

At first I loaded less than 500 objects from jsonplaceholder. I searched them through and cloufdlare dashboard said median cpu time was ~40ms(upto +70ms). But my limit is 10ms for the free plan.

So I went one step ahead and loaded ~6000 objects(posts, comments, photos, todos) and searched using different keyword from few letters to sentences. The dashboard wasn’t updating instantly but the response time dropped significantly by ~80-125ms. Yet there was no error message saying i exceeded the cpu usage. Even after at least +100ms of cpu time.
Is this a bug or something unexpected? Or i’m measuring the wrong way?

Nobody has replied, but as I recall, you’re allowed to burst CPU time. Though if it happens enough, you’ll start to get the warnings.

1 Like

Ah as suspected. Thanks for the tip. But there is no data on what is enough, right?

Not yet. :wink:

I’m working on a similar use case, and it seems that Fuse.js isn’t very fast at all:

https://raw.githack.com/nextapps-de/flexsearch/master/test/benchmark.html

Might be worth looking at flexsearch as an alternative implementation to reduce CPU usage.

I’m having a go at implementing it now.

I’ve tried flexsearch and it can handle quite a lot of objects, abour 3MB worth of per-generated indexed text, but you’ll only be beneath the CPU limit at ~2MB, when there’s consistent traffic.

I’d try something like this instead:

Should be able to do up to at least 5MB of data.

1 Like

@thomas4 - how did you get flexsearch to even run in the worker? No matter how I add it I end up with “Script startup timed out.” when I run “wrangler preview”.

Currently I’ve added flexsearch as a dependency in package.json and call it in the worker with const FlexSearch = require(“flexsearch”);

My object that needs to be searched is ~60k lines at ~7Mb so it’s not too big.

Make sure to use the latest version and tree-shaking, use only the modules you actually need.

You can see in the comparison that the resulting package is tiny.

Interesting observation here if IndexedDB wasn’t removed from Workers, it would have been very useful for just this specific purpose, like an embedded database.

1 Like

Indeed. The problem is that large array datasets require an index in order to be searchable and return results within the CF Worker time limits.

But it’s impossible to build the index within the Worker time limits. Even importing a prebuilt index trips the threshold too.

I reduced my dataset down to the bare minimum of 58k lines at 1.4Mb uncompressed. Using Fuse I created a prebuilt index (docs) which is 2.7Mb uncompressed.

In my browser it takes 114ms to read in the index file and do a JSON.stringify, significantly more than the CF Worker CPU limit even on the Enterprise plan even before any other operations are performed.

I guess that even if indexedDB were enabled, the initial creation of the index from the array, or of inporting a prebuilt index into the IndexedDB, would still exceed the worker limit even if the loading operation is around 100ms in duration which is tiny.

Looks like I will have to use an alternative solution for searching large datasets that isn’t CF Workers. That’s a real shame as I was hoping to avoid having to provide other origin types for this autocomplete use case. Being able to perform fast search functions against a large dataset at the network edge with seamless scalability is a killer function - if it worked!!

For my use-case, which is just small-site-search, I created a Lambda worker that create the index and upload it to Workers KV and the Index can’t be bigger than 1.5MB or the CPU-time runs out. I’m waiting until they release Worker Unbound, so I can just move the Lambda. You should try edgesearch, because it will be able to handle much more data (as you can see in the demos). With WASMs being able to run 10x faster, this should be very responsive now, even fast enough for autocomplete.

I would keep in mind though, that you’ll amount a significant amount of requests if you don’t rate-limit the IP’s searches.

2 Likes

Thought I’d update after some new tests I did without any search/index feature, just plain JSON from KV and native features to sort and find objects.

Just a simple find ID in array will top out 50ms at 1000 items and 1.7MB.

Another worthy note is that loading the 1.7MB KV from cold-storage (new KV) takes ~100ms.
Which is a huge improvement over the previous 1-2 seconds it used to take.

1 Like