Includes our GGUF chat template fixes! Tool calling works as well!
If you are using llama.cpp, use --jinja to enable the system prompt.

Unsloth Dynamic 2.0 achieves SOTA performance in model quantization.

👁 Image
👁 Image
👁 Image

✨ How to Use Mistral 3.2 Small:

Run in llama.cpp:

./llama.cpp/llama-cli -hf unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL --jinja --temp 0.15 --top-k -1 --top-p 1.00 -ngl 99

Run in Ollama:

ollama run hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:UD-Q4_K_XL

Temperature of: 0.15
Set top_p to: 1.00
Max tokens (context length): 128K
Fine-tune Mistral v0.3 (7B) for free using our Google Colab notebook here!
View the rest of our notebooks in our docs here.

Mistral-Small-3.2-24B-Instruct-2506

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Small-3.2 improves in the following categories:

Instruction following: Small-3.2 is better at following precise instructions
Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
Function calling: Small-3.2's function calling template is more robust (see here and examples)

In all other categories Small-3.2 should match or slightly improve compared to Mistral-Small-3.1-24B-Instruct-2503.

Key Features

same as Mistral-Small-3.1-24B-Instruct-2503

Benchmark Results

We compare Mistral-Small-3.2-24B to Mistral-Small-3.1-24B-Instruct-2503. For more comparison against other models of similar size, please check Mistral-Small-3.1's Benchmarks'

Text

Instruction Following / Chat / Tone

Model	Wildbench v2	Arena Hard v2	IF (Internal; accuracy)
Small 3.1 24B Instruct	55.6%	19.56%	82.75%
Small 3.2 24B Instruct	65.33%	43.1%	84.78%

Infinite Generations

Small 3.2 reduces infitine generations by 2x on challenging, long and repetitive prompts.

Model	Infinite Generations (Internal; Lower is better)
Small 3.1 24B Instruct	2.11%
Small 3.2 24B Instruct	1.29%

STEM

Model	MMLU	MMLU Pro (5-shot CoT)	MATH	GPQA Main (5-shot CoT)	GPQA Diamond (5-shot CoT )	MBPP Plus - Pass@5	HumanEval Plus - Pass@5	SimpleQA (TotalAcc)
Small 3.1 24B Instruct	80.62%	66.76%	69.30%	44.42%	45.96%	74.63%	88.99%	10.43%
Small 3.2 24B Instruct	80.50%	69.06%	69.42%	44.22%	46.13%	78.33%	92.90%	12.10%

Vision

Model	MMMU	Mathvista	ChartQA	DocVQA	AI2D
Small 3.1 24B Instruct	64.00%	68.91%	86.24%	94.08%	93.72%
Small 3.2 24B Instruct	62.50%	67.09%	87.4%	94.86%	92.91%

Usage

The model can be used with the following frameworks;

vllm (recommended): See here
transformers: See here

Note 1: We recommend using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend to use the one provided in the SYSTEM_PROMPT.txt file.

vLLM (recommended)

We recommend using this model with vLLM.

Installation

Make sure to install vLLM >= 0.9.1:

pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.6.2.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

We recommand that you use Mistral-Small-3.2-24B-Instruct-2506 in a server/client setting.

Spin up a server:

vllm serve mistralai/Mistral-Small-3.2-24B-Instruct-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

Note: Running Mistral-Small-3.2-24B-Instruct-2506 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

To ping the client you can use a simple Python snippet. See the following examples.

Vision reasoning

Take leverage of the vision capabilities of Mistral-Small-3.2-24B-Instruct-2506 to take the best choice given a scenario, go catch them all !

Function calling

Mistral-Small-3.2-24B-Instruct-2506 is excellent at function / tool calling tasks via vLLM. E.g.:

Instruction following

Mistral-Small-3.2-24B-Instruct-2506 will follow your instructions down to the last letter !

Transformers

You can also use Mistral-Small-3.2-24B-Instruct-2506 with Transformers !

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.6.2 to use our tokenizer.

pip install mistral-common --upgrade

Then load our tokenizer along with the model and generate:

Downloads last month: 28,745

GGUF

Model size

24B params

Architecture

llama

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Quantized

(62)

this model

Quantizations

1 model

Collections including unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

A collection of Mistral's new Small 3.2 and 3 models including GGUF, 4-bit and more! • 20 items • Updated 4 days ago • 22

New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & SOTA quantization performance. • 106 items • Updated 1 day ago • 718

URL: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

⇱ unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF · Hugging Face