VOOZH about

URL: https://huggingface.co/RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic

⇱ RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic · Hugging Face


Apertus-8B-Instruct-2509-FP8-dynamic 👁 Model Icon

👁 Validated Badge

Model Overview

  • Model Architecture: ApertusForCausalLM
    • Input: Text
    • Output: Text
  • Model Optimizations:
    • Weight quantization: FP8
    • Activation quantization: FP8
  • Release Date: 9/18/2025
  • Version: 1.0
  • Model Developers: Red Hat
  • ModelCar Storage URI: oci://registry.redhat.io/rhai/modelcar-apertus-8b-instruct-2509-fp8-dynamic:3.0
  • Validated on RHOAI 3.2: quay.io/modh/vllm:rhoai-3.2-cuda
  • Validated on RHAIIS 3.2.5: http://registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.2.5-1766067105
  • Validated on vLLM: 0.11.2

Quantized version of swiss-ai/Apertus-8B-2509.

Model Optimizations

This model was obtained by quantizing the weights and activations of swiss-ai/Apertus-8B-2509 to FP8 data type. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized.

Deployment

Use with vLLM

  1. Initialize vLLM server:
vllm serve RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic
  1. Send requests to the server:
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://<your-server-host>:8000/v1"

client = OpenAI(
 api_key=openai_api_key,
 base_url=openai_api_base,
)

model = "RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic"

messages = [
 {"role": "user", "content": "Give me a short introduction to large language model."},
]

outputs = client.chat.completions.create(
 model=model,
 messages=messages,
)

generated_text = outputs.choices[0].message.content
print(generated_text)

Creation

This model was created with llm-compressor by running the code snippet below.

Evaluation

The model was evaluated on OpenLLM Leaderboard V1, using the following command:

Accuracy

Category Metric swiss-ai/Apertus-8B-Instruct-2509 RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic Recovery (%)
OpenLLM V1 ARC-Challenge (Acc-Norm, 25-shot) 65.02 65.59 101.4
GSM8K (Strict-Match, 5-shot) 58.07 55.50 95.6
HellaSwag (Acc-Norm, 10-shot) 80.87 81.06 100.2
MMLU (Acc, 5-shot) 61.97 61.86 99.8
TruthfulQA (MC2, 0-shot) 58.14 58.18 100.1
Winogrande (Acc, 5-shot) 75.14 75.45 100.4
Average Score 66.54 66.33 99.7
Downloads last month
892
Safetensors
Model size
8B params
Tensor type
BF16
·
F8_E4M3
·

Model tree for RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic

Quantized
(34)
this model

Collection including RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic