I’m experiencing unexpectedly slow insert performance with Cloudflare Vectorize during a large-scale vector insertion. Over 12 hours, I successfully inserted about 2.5 million documents individually or in very small groups (1-2 vectors at a time). However, after about 36 hours, my process is still at around 1.9 million vectors total. It appears that Vectorize is batching inserts at about 1,000 vectors each, rather than the advertised batches of up to 200,000 vectors for improved throughput. My understanding was that Vectorize would automatically batch inserts at these larger sizes to optimize performance, but this doesn’t seem to be happening. Do I need to explicitly batch my inserts (e.g., in groups of 5,000 vectors) to achieve better efficiency, or is there something else going on here? Could anyone from Cloudflare clarify how batching works internally with Vectorize and suggest the best practices or architecture adjustments for optimizing large-scale vector insert operations?
What steps have you taken to resolve the issue?
All I can do at this point is wait for the insert to complete.
Where are you seeing 200,000 listed as a batch upload limit?
If your process is still running, please open a Support issue and share the Account ID with the issue, and we’ll be glad to see why it’s taking so long.
When I say 200k I am referencing this quote from a Cloudflare blog post:
“It will batch up to 200,000 vectors at once (a value we arrived at after our own testing) with a limit of 1,000 blocks. With this throughput, we have been able to quickly load millions of vectors into an index (with upserts of 5,000 vectors at a time).”
So supposedly internally vectorize can batch 200k vectors at once using WAL. What I didn’t realize is that this only seems to occur when you do inserts in batches of 5000. If you do inserts one by one, it’s a much slower process. I recreated the database doing these inserts 5000 at a time for two indexes 2 million vectors each and it took about 12 hours. I’m not sure if that counts as “quickly” but it was good enough for me.
Generally I would recommend Vectorize. It’s quick once you get it set up. Just make sure that you keep the metadata and embeddings you’re inserting into it in case you run into an issue like I had.