Mistral Large 3 675B Base 2512
From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from scratch with 3000 H200s.
This model is the base pre-trained version, not fine-tuned for instruction or reasoning tasks, making it ideal for custom post-training processes.
Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
Mistral Large 3 Instruct is deployable on-premises in:
Key Features
Mistral Large 3 consists of two main architectural components:
- A Granular MoE Language Model with 673B params and 39B active
- A 2.5B Vision Encoder
The Mistral Large 3 Base model offers the following capabilities:
- Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
- Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
- Frontier: Delivers best-in-class performance.
- Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
- Large Context Window: Supports a 256k context window.
Use Cases
With powerful long-context performance, stable and consistent cross-domain behavior, Mistral Large 3 is perfect for:
- Long Document Understanding
- Powerful Daily-Driver AI Assistants
- State-of-the-Art Agentic and Tool-Use Capabilities
- Enterprise Knowledge Work
- General Coding Assistant
And enterprise-grade use cases requiring frontier capabilities.
Recommended Settings
We recommend deploying Large 3 in a client-server configuration with the following best practices:
- System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
- Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
- Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
- Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.
Known Issues / Limitations
- Not a dedicated reasoning model: Dedicated reasoning models can outperform Mistral Large 3 in strict reasoning use cases.
- Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.
- Complex deployment: Due to its large size and architecture, the model can be challenging to deploy efficiently with constrained resources or at scale.
Benchmark Results
We compare Mistral Large 3 to similar sized models.
Instruct Usage
The Instruct model can be used with the following frameworks;
vLLM
We recommend using this model with vLLM.
Installation
Make sure to install vllm >= 1.12.0:
pip install vllm --upgrade
Doing so should automatically install mistral_common >= 1.8.6.
To check:
python -c "import mistral_common; print(mistral_common.__version__)"
You can also make use of a ready-to-go docker image or on the docker hub.
Serve
The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
A simple launch command is:
vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
--tensor-parallel-size 8 \
--enable-auto-tool-choice --tool-call-parser mistral
Key parameter notes:
- enable-auto-tool-choice: Required when enabling tool usage.
- tool-call-parser mistral: Required when enabling tool usage.
Additional flags:
- You can set
--max-model-lento preserve memory. By default it is set to262144which is quite large but not necessary for most scenarios. - You can set
--max-num-batched-tokensto balance throughput and latency, higher means higher throughput but higher latency.
Usage of the model
Here we asumme that the model mistralai/Mistral-Large-3-675B-Instruct-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.
License
This model is licensed under the Apache 2.0 License.
You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third partyβs rights, including intellectual property rights.
- Downloads last month
- 22
