Voxtral Mini 1.0 (3B) - 2507

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

Learn more about Voxtral in our blog post here and our research paper.

Key Features

Voxtral builds upon Ministral-3B with powerful audio understanding capabilities.

Dedicated transcription mode: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly
Long-form context: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding
Built-in Q&A and summarization: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models
Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian)
Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents
Highly capable at text: Retains the text understanding capabilities of its language model backbone, Ministral-3B

Benchmark Results

Audio

Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks:

👁 image/png

Text

👁 image/png

Usage

The model can be used with the following frameworks;

vllm (recommended): See here
Transformers 🤗: See here

Notes:

temperature=0.2 and top_p=0.95 for chat completion (e.g. Audio Understanding) and temperature=0.0 for transcription
Multiple audios per message and multiple user turns with audio are supported
System prompts are not yet supported

vLLM (recommended)

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 0.10.0, we recommend using uv:

uv pip install -U "vllm[audio]" --system

Doing so should automatically install mistral_common >= 1.8.1.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

Offline

You can test that your vLLM setup works as expected by cloning the vLLM repo:

git clone https://github.com/vllm-project/vllm && cd vllm

and then running:

python examples/offline_inference/audio_language.py --num-audios 2 --model-type voxtral

Serve

We recommend that you use Voxtral-Small-24B-2507 in a server/client setting.

Spin up a server:

vllm serve mistralai/Voxtral-Mini-3B-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral

Note: Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM in bf16 or fp16.

To ping the client you can use a simple Python snippet. See the following examples.

Audio Instruct

Leverage the audio capabilities of Voxtral-Mini-3B-2507 to chat.

Make sure that your client has mistral-common with audio installed:

pip install --upgrade mistral_common\[audio\]

Transcription

Voxtral-Mini-3B-2507 has powerful transcription capabilities!

Make sure that your client has mistral-common with audio installed:

pip install --upgrade mistral_common\[audio\]

Transformers 🤗

Starting with transformers >= 4.54.0 and above, you can run Voxtral natively!

Install Transformers:

pip install -U transformers

Make sure to have mistral-common >= 1.8.1 installed with audio dependencies:

pip install --upgrade "mistral-common[audio]"

Audio Instruct

Transcription

Downloads last month: 268,408

Safetensors

Model size

5B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 66 Ask for provider support

Model tree for mistralai/Voxtral-Mini-3B-2507

Adapters

12 models

Finetunes

19 models

Quantizations

19 models

Spaces using mistralai/Voxtral-Mini-3B-2507 23

Collection including mistralai/Voxtral-Mini-3B-2507

Mistral AI Audio models. • 4 items • Updated Apr 14 • 10

Paper for mistralai/Voxtral-Mini-3B-2507

Paper • 2507.13264 • Published Jul 17, 2025 • 35

Evaluation results

hf-audio/open-asr-leaderboard leaderboard
Mean Wer View evaluation results
👁 Image

source
7.05
Rtfx View evaluation results
👁 Image

source
109.86
Ami Wer View evaluation results
👁 Image

source
16.3

URL: https://huggingface.co/mistralai/Voxtral-Mini-3B-2507

⇱ mistralai/Voxtral-Mini-3B-2507 · Hugging Face

Voxtral Mini 1.0 (3B) - 2507

Key Features

Benchmark Results

Audio

Text

Usage

vLLM (recommended)

Installation

Offline

Serve

Audio Instruct

Transcription

Transformers 🤗

Audio Instruct

Transcription

Model tree for mistralai/Voxtral-Mini-3B-2507

Spaces using mistralai/Voxtral-Mini-3B-2507 23

Collection including mistralai/Voxtral-Mini-3B-2507

Paper for mistralai/Voxtral-Mini-3B-2507

Evaluation results