What is the name of the modal you’re running?
AutoRAG
What is the issue or error you’re encountering
The counting of the neurons used for the text generation model does not correspond with the price documentation
What are the steps to reproduce the issue?
I tested the autoRAG feature following the steps of the official video.
I created a bucket and uploaded 15 plain text files, around 400K in total.
I used all the default settings: @cf/meta/llama-3.3-70b-instruct-fp8-fast
for text generation, Gateway AI with cache on, query rewrite, and so on.
I run around 20 queries in the playground.
The Workers AI dashboard says the text generation model consumes:
- 61.65k input tokens
- 3.8k output tokens
that makes a total of 309.06k neurons.
That is not what the documentation of @cf/meta/llama-3.3-70b-instruct-fp8-fast
model says. It says:
- 26668 neurons per M input tokens
- 204805 neurons per M output tokens
I consumed in the level of K tokens but the dashboard says I consumed neurons in the level of hundred of million tokens.
Please any insight of what could be happening. Am I missing something?
Thank you in advance.
P.S.: This is the first and only project using Workers AI and autoRAG I use in this account.