Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 👁 Model Icon

Model Overview

Model Architecture: Llama4ForConditionalGeneration
- Input: Text / Image
- Output: Text
Model Optimizations:
- Activation quantization: None
- Weight quantization: INT4
Release Date: 04/25/2025
Version: 1.0
Validated on: RHOAI 2.20, RHAIIS 3.0, RHELAI 1.5
Model Developers: Red Hat (Neural Magic)

Model Optimizations

This model was obtained by quantizing weights of Llama-4-Scout-17B-16E-Instruct to INT4 data type. This optimization reduces the number of bits used to represent weights from 16 to 4, reducing GPU memory requirements by approximately 75%. Weight quantization also reduces disk size requirements by approximately 75%. The llm-compressor library is used for quantization.

Deployment

This model can be deployed efficiently on vLLM, Red Hat Enterprise Linux AI, and Openshift AI, as shown in the example below.

Deploy on vLLM

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16"
number_gpus = 4

sampling_params = SamplingParams(temperature=0.7, top_p=0.8, max_tokens=256)

tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "Give me a short introduction to large language model."

llm = LLM(model=model_id, tensor_parallel_size=number_gpus)

outputs = llm.generate(prompt, sampling_params)

generated_text = outputs[0].outputs[0].text
print(generated_text)

vLLM also supports OpenAI-compatible serving. See the documentation for more details.

Evaluation

The model was evaluated on the OpenLLM leaderboard tasks (v1 and v2), long context RULER, multimodal MMMU, and multimodal ChartQA. All evaluations are obtained through lm-evaluation-harness.

Accuracy

	Recovery (%)	meta-llama/Llama-4-Scout-17B-16E-Instruct	RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 (this model)
ARC-Challenge 25-shot	98.51	69.37	68.34
GSM8k 5-shot	100.4	90.45	90.90
HellaSwag 10-shot	99.67	85.23	84.95
MMLU 5-shot	99.75	80.54	80.34
TruthfulQA 0-shot	99.82	61.41	61.30
WinoGrande 5-shot	98.98	77.90	77.11
OpenLLM v1 Average Score	99.59	77.48	77.16
IFEval 0-shot avg of inst and prompt acc	99.51	86.90	86.47
Big Bench Hard 3-shot	99.46	65.13	64.78
Math Lvl 5 4-shot	99.22	57.78	57.33
GPQA 0-shot	100.0	31.88	31.88
MuSR 0-shot	100.9	42.20	42.59
MMLU-Pro 5-shot	98.67	55.70	54.96
OpenLLM v2 Average Score	99.54	56.60	56.34
MMMU 0-shot	100.6	53.44	53.78
ChartQA 0-shot exact_match	100.1	65.88	66.00
ChartQA 0-shot relaxed_accuracy	99.55	88.92	88.52
Multimodal Average Score	100.0	69.41	69.43
RULER seqlen = 131072 niah_multikey_1	98.41	88.20	86.80
RULER seqlen = 131072 niah_multikey_2	94.73	83.60	79.20
RULER seqlen = 131072 niah_multikey_3	96.44	78.80	76.00
RULER seqlen = 131072 niah_multiquery	98.79	95.40	94.25
RULER seqlen = 131072 niah_multivalue	101.6	73.75	74.95
RULER seqlen = 131072 niah_single_1	100.0	100.00	100.0
RULER seqlen = 131072 niah_single_2	100.0	99.80	99.80
RULER seqlen = 131072 niah_single_3	100.2	99.80	100.0
RULER seqlen = 131072 ruler_cwe	87.39	39.42	33.14
RULER seqlen = 131072 ruler_fwe	98.13	92.93	91.20
RULER seqlen = 131072 ruler_qa_hotpot	100.4	48.20	48.40
RULER seqlen = 131072 ruler_qa_squad	96.22	53.57	51.55
RULER seqlen = 131072 ruler_qa_vt	98.82	92.28	91.20
RULER seqlen = 131072 Average Score	98.16	80.44	78.96

Downloads last month: 6,644

Safetensors

Model size

109B params

Tensor type

BF16

I64

I32

Model tree for RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

Base model

meta-llama/Llama-4-Scout-17B-16E

Finetuned

meta-llama/Llama-4-Scout-17B-16E-Instruct

Quantized

(36)

this model

Collections including RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

Quantized variants of the Llama 4 release by Meta. • 4 items • Updated Apr 30 • 2

May 2025 Collection of third-party generative AI models validated by Red Hat AI for use across the Red Hat AI Product Portfolio. • 39 items • Updated Apr 30 • 20

URL: https://huggingface.co/RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

⇱ RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 · Hugging Face