Mistral-Small-3.2-24B-Instruct-2506
Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.
Small-3.2 improves in the following categories:
- Instruction following: Small-3.2 is better at following precise instructions
- Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
- Function calling: Small-3.2's function calling template is more robust (see here and examples)
In all other categories Small-3.2 should match or slightly improve compared to Mistral-Small-3.1-24B-Instruct-2503.
Key Features
Benchmark Results
We compare Mistral-Small-3.2-24B to Mistral-Small-3.1-24B-Instruct-2503. For more comparison against other models of similar size, please check Mistral-Small-3.1's Benchmarks'
Text
Instruction Following / Chat / Tone
| Model | Wildbench v2 | Arena Hard v2 | IF (Internal; accuracy) |
|---|---|---|---|
| Small 3.1 24B Instruct | 55.6% | 19.56% | 82.75% |
| Small 3.2 24B Instruct | 65.33% | 43.1% | 84.78% |
Infinite Generations
Small 3.2 reduces infinite generations by 2x on challenging, long and repetitive prompts.
| Model | Infinite Generations (Internal; Lower is better) |
|---|---|
| Small 3.1 24B Instruct | 2.11% |
| Small 3.2 24B Instruct | 1.29% |
STEM
| Model | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT ) | MBPP Plus - Pass@5 | HumanEval Plus - Pass@5 | SimpleQA (TotalAcc) |
|---|---|---|---|---|---|---|---|---|
| Small 3.1 24B Instruct | 80.62% | 66.76% | 69.30% | 44.42% | 45.96% | 74.63% | 88.99% | 10.43% |
| Small 3.2 24B Instruct | 80.50% | 69.06% | 69.42% | 44.22% | 46.13% | 78.33% | 92.90% | 12.10% |
Vision
| Model | MMMU | Mathvista | ChartQA | DocVQA | AI2D |
|---|---|---|---|---|---|
| Small 3.1 24B Instruct | 64.00% | 68.91% | 86.24% | 94.08% | 93.72% |
| Small 3.2 24B Instruct | 62.50% | 67.09% | 87.4% | 94.86% | 92.91% |
Usage
The model can be used with the following frameworks;
vllm (recommended): See heretransformers: See here
Note 1: We recommend using a relatively low temperature, such as temperature=0.15.
Note 2: Make sure to add a system prompt to the model to best tailor it to your needs. If you want to use the model as a general assistant, we recommend to use the one provided in the SYSTEM_PROMPT.txt file.
vLLM (recommended)
We recommend using this model with vLLM.
Installation
Make sure to install vLLM >= 0.9.1:
pip install vllm --upgrade
Doing so should automatically install mistral_common >= 1.6.2.
To check:
python -c "import mistral_common; print(mistral_common.__version__)"
You can also make use of a ready-to-go docker image or on the docker hub.
Serve
We recommend that you use Mistral-Small-3.2-24B-Instruct-2506 in a server/client setting.
- Spin up a server:
vllm serve mistralai/Mistral-Small-3.2-24B-Instruct-2506 \
--tokenizer_mode mistral --config_format mistral \
--load_format mistral --tool-call-parser mistral \
--enable-auto-tool-choice --limit-mm-per-prompt '{"image":10}' \
--tensor-parallel-size 2
Note: Running Mistral-Small-3.2-24B-Instruct-2506 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
- To ping the client you can use a simple Python snippet. See the following examples.
Vision reasoning
Leverage the vision capabilities of Mistral-Small-3.2-24B-Instruct-2506 to make the best choice given a scenario, go catch them all !
Function calling
Mistral-Small-3.2-24B-Instruct-2506 is excellent at function / tool calling tasks via vLLM. E.g.:
Instruction following
Mistral-Small-3.2-24B-Instruct-2506 will follow your instructions down to the last letter !
Transformers
You can also use Mistral-Small-3.2-24B-Instruct-2506 with Transformers !
To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.6.2 to use our tokenizer.
pip install mistral-common --upgrade
Then load our tokenizer along with the model and generate:
- Downloads last month
- 513,782
Model tree for mistralai/Mistral-Small-3.2-24B-Instruct-2506
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503