What is the name of the modal you’re running?
mistral-small-3.1-24b-instruct
What is the error number?
3030
What is the error message?
030: 3 validation errors for ValidatorIterator 0.typed-dict Input should be a valid dictionary [type=dict_type, input_value=‘type’, input_type=str]
What is the issue or error you’re encountering
I can’t figure out exactly what the api expects when doing visual tasks with mistral-small-3.1-24b-instruct.
What steps have you taken to resolve the issue?
So I have successfully used @cf/meta/llama-3.2-11b-vision-instruct for a vision task and I’m trying to now use @cf/mistralai/mistral-small-3.1-24b-instruct since it claims to be start of the art model for lang and vision on the cloudflare page : mistral-small-3.1-24b-instruct · Cloudflare Workers AI docs
I’ve also used the mitralai model purely for language tasks.
The main roadblock is I cannot figure the API, the only documentation included says :
that the message content should look like :
{type (string) : Type of the content provided
text (string) :
image_url : { url (string): image uri with data (e.g. data:image/jpeg;base64,/9j/...). HTTP URL will not be accepted }
}
It doesn’t really say what’s expected to be in type though
Here is my whole code that calls to llama and then to mistral for comparison. Llama code does work.
// GET the url
const response = await fetch(URL)
// XXXXX (mtourne) this needs to be done just once but to accept the license.
// Note : https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct/
// To use Llama 3.2 11b Vision Instruct, you need to agree to the Meta License and Acceptable Use Policy . To do so, please send an initial request to @cf/meta/llama-3.2-11b-vision-instruct with "prompt" : "agree". After that, you'll be able to use the model as normal.
//
// await c.env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {
// prompt: "agree"
// });
const blob = await response.arrayBuffer();
const blob_u8 = new Uint8Array(blob)
function encodeBase64Bytes(bytes: Uint8Array): string {
return btoa(
bytes.reduce((acc, current) => acc + String.fromCharCode(current), "")
);
}
const image = [...blob_u8];
const results = await c.env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {
messages: [
{
role: "system",
content: "You are an expert at labeling images, tell me what you see in the following image."
},
],
image: image,
});
await ctx.reply("Response Llama vision:")
await ctx.reply(results['response'])
const content_type = 'image/jpeg';
const image_b64 = encodeBase64Bytes(blob_u8);
const uri_encoded_image = `data:${content_type};base64,${image_b64}`
//// XX (mtourne): we're trying to send data to mistral which should also be capable of vision reasoning, but I can't figure out the API format
const results2 = await c.env.AI.run("@cf/mistralai/mistral-small-3.1-24b-instruct", {
messages: [
{
role: "system",
content: "You are an expert at labeling images, tell me what you see in the following image."
},
{
role: "user",
content: {
type: content_type,
text: "Here is the image",
image_url : { url: uri_encoded_image },
},
},
],
});
await ctx.reply("Response Mistral 24b:")
return ctx.reply(results2['response']);