Ministral 3 14B Reasoning 2512

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized.

Learn more in our blog post and paper.

Key Features

Ministral 3 14B consists of two main architectural components:

13.5B Language Model
0.4B Vision Encoder

The Ministral 3 14B Reasoning model offers the following capabilities:

Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
System Prompt: Maintains strong adherence and support for system prompts.
Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
Reasoning: Excels at complex, multi-step reasoning and dynamic problem-solving.
Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
Large Context Window: Supports a 256k context window.

Use Cases

Private AI deployments where advanced capabilities meet practical hardware constraints:

Private/custom chat and AI assistant deployments in constrained environments
Advanced local agentic use cases
Fine-tuning and specialization
And more...

Bringing advanced AI capabilities to most environments.

Recommended Settings

We recommend deploying with the following best practices:

System Prompt: Use our provided system prompt, and append it to your custom system prompt to define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
Multi-turn Traces: We highly recommend keeping the reasoning traces in context.
Sampling Parameters: Use a temperature of 1 for most environments ; Different temperatures may be explored for different use cases - developers are encouraged to experiment with alternative settings.
Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.

Ministral 3 Family

Model Name	Type	Precision	Link
Ministral 3 3B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 3B Instruct 2512	Instruct post-trained	FP8	Hugging Face
Ministral 3 3B Reasoning 2512	Reasoning capable	BF16	Hugging Face
Ministral 3 8B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 8B Instruct 2512	Instruct post-trained	FP8	Hugging Face
Ministral 3 8B Reasoning 2512	Reasoning capable	BF16	Hugging Face
Ministral 3 14B Base 2512	Base pre-trained	BF16	Hugging Face
Ministral 3 14B Instruct 2512	Instruct post-trained	FP8	Hugging Face
Ministral 3 14B Reasoning 2512	Reasoning capable	BF16	Hugging Face

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

Model	AIME25	AIME24	GPQA Diamond	LiveCodeBench
Ministral 3 14B
Qwen3-14B (Thinking)	0.737	0.837	0.663	0.593
Ministral 3 8B	0.787	0.668
Qwen3-VL-8B-Thinking	0.580
Ministral 3 3B	0.534
Qwen3-VL-4B-Thinking	0.697	0.729	0.513

Instruct

Model	Arena Hard	WildBench	MATH Maj@1	MM MTBench
Ministral 3 14B
Qwen3 14B (Non-Thinking)	0.427	65.1	0.870	NOT MULTIMODAL
Gemma3-12B-Instruct	0.436	63.2	0.854	6.70
Ministral 3 8B	0.509	0.876
Qwen3-VL-8B-Instruct	66.3	8.00
Ministral 3 3B	0.305	0.830	7.83
Qwen3-VL-4B-Instruct
Qwen3-VL-2B-Instruct	0.163	42.2	0.786	6.36
Gemma3-4B-Instruct	0.318	49.1	0.759	5.23

Base

Model	Multilingual MMLU	MATH CoT 2-Shot	AGIEval 5-shot	MMLU Redux 5-shot	MMLU 5-shot
Ministral 3 14B	0.742	0.648	0.820	0.794	0.749
Qwen3 14B Base	0.620	0.703
Gemma 3 12B Base	0.690	0.487	0.587	0.766	0.745
Ministral 3 8B	0.591	0.793
Qwen 3 8B Base	0.700	0.576	0.760	0.639
Ministral 3 3B	0.652	0.511	0.735	0.707	0.592
Qwen 3 4B Base	0.405	0.530
Gemma 3 4B Base	0.516	0.294	0.430	0.626	0.589

Usage

The model can be used with the following frameworks;

vllm: See here
transformers: See here

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 0.12.0:

pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.8.6.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

To fully exploit the Ministral-3-14B-Reasoning-2512 we recommed using 2xH200 GPUs for deployment due to its large context. However if you don't need a large context, you can fall back to a single GPU.

A simple launch command is:


vllm serve mistralai/Ministral-3-14B-Reasoning-2512 \
 --tensor-parallel-size 2 \
 --tokenizer_mode mistral --config_format mistral --load_format mistral \
 --enable-auto-tool-choice --tool-call-parser mistral \
 --reasoning-parser mistral

Key parameter notes:

enable-auto-tool-choice: Required when enabling tool usage.
tool-call-parser mistral: Required when enabling tool usage.
reasoning-parser mistral: Required when enabling reasoning.

Additional flags:

You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Usage of the model

Here we assume that the model mistralai/Ministral-3-8B-Reasoning-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Transformers

You can also use Ministral 3 3B Reasoning 2512 with Transformers ! Make sure to install Transformers from its first v5 release candidate or from "main":

pip install transformers==5.0.0rc0

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.

pip install mistral-common --upgrade

Then load our tokenizer along with the model and generate:

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Downloads last month: 9,559

Safetensors

Model size

14B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 4 Ask for provider support

Model tree for mistralai/Ministral-3-14B-Reasoning-2512

Base model

mistralai/Ministral-3-14B-Base-2512

Finetuned

(11)

this model

Adapters

1 model

Finetunes

14 models

Merges