๐ Meta
Meta
Llama 3.1 405B
FrontierJul 2024Released131K tokensContextLlama 3.1License79 StrongQuality
Llama 3.1 405B (405B parameters) requires approximately 256.2 GB of VRAM with Q4_K_M quantization. For the best balance of quality and speed, we recommend hardware with at least 295 GB of VRAM.
Get started
โ copy & paste to run locallyCopy-paste commands to run Llama 3.1 405B on your machine.
Run
ollama run llama3.1:405bQuick specs
Parameters405B
Architecturedense
Context131K tokens
Modalitytext
Min RAM158 GB
Rec. RAM247.1 GB (Q4_K_M)
LicenseLlama 3.1
FamilyLlama
โ Codeโ Chatโ Reasoning
About this model
Related models
Your hardware
Detecting...
Quick picks
Best hardware
Top picks for Llama 3.1 405B
Run this model
Quantization options
VRAM estimates by quant level
No hardware detected โ fit column shows raw VRAM estimates
| Quant | Bits | VRAM | Quality | Fit |
|---|---|---|---|---|
Q2_K | 2 | 158.0 GB | Low | โ |
Q3_K_S | 3 | 198.5 GB | Low | โ |
NVFP4 | 4 | 226.8 GB | Medium | โ |
Q4_K_M | 4 | 247.1 GB | Medium | โ |
Q5_K_M | 5 | 291.6 GB | High | โ |
Q6_K | 6 | 332.1 GB | High | โ |
Q8_0 | 8 | 433.4 GB | Very High | โ |
F16 | 16 | 830.2 GB | Maximum | โ |
Quality benchmarks
Llama 3.1 405B benchmark scores
Coding
SWE-bench Verifiedโ
HumanEval+89.0%
Aider Polyglotโ
LiveCodeBench30.1%
Reasoning
MMLU-Pro73.3%
GPQA Diamond50.7%
MATH-50073.8%
ARC Challenge96.9%
General
Chatbot Arenaโ
IFEval88.6%
Source: official ยท 2024-07-23
Hardware compatibility
Fit estimates across all hardware
Computing compatibility...
Memory breakdown
Reference: RTX 2060 6GB
Weights247.1 GB
KV Cache7.7 GB
Runtime0.9 GB
Headroom0.6 GB
Frequently asked questions
FAQ โ Llama 3.1 405B
See also
