VOOZH

URL: https://www.phoronix.com/review/intel-xeon-6-granite-rapids-amx/4

⇱ The Massive AI Performance Benefit With AMX On Intel Xeon 6 "Granite Rapids" - Phoronix

👁 Phoronix

Articles & Reviews
News Archive
Forums
Premium
Contact
Categories

Computers Display Drivers Graphics Cards Linux Gaming Memory Motherboards Processors Software Storage Operating Systems Peripherals

The Massive AI Performance Benefit With AMX On Intel Xeon 6 "Granite Rapids"

Written by Michael Larabel in Processors on 24 September 2025 at 10:20 AM EDT. Page 4 of 6. 15 Comments.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Qwen3-8B-Q8_0, Test: Text Generation 128. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 512. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 1024. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 2048. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 2048. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 2048. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Qwen3-8B-Q8_0, Test: Prompt Processing 2048. AMX Enabled - Default was the fastest.

For Llama.cpp too, Advanced Matrix Extensions (AMX) continued to prove to be a massive benefit for faster prompt processing with large language models like Qwen3. There was higher peak server power consumption with AMX but still on a performance-per-Watt basis it pays off.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: gpt-oss-20b-Q8_0, Test: Text Generation 128. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: gpt-oss-20b-Q8_0, Test: Prompt Processing 512. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: gpt-oss-20b-Q8_0, Test: Prompt Processing 1024. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: gpt-oss-20b-Q8_0, Test: Prompt Processing 2048. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: gpt-oss-20b-Q8_0, Test: Prompt Processing 2048. AMX Enabled - Default was the fastest.

👁 Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: gpt-oss-20b-Q8_0, Test: Prompt Processing 2048. AMX Enabled - Default was the fastest.

The Intel AMX numbers for Granite Rapids remained very impressive when running GPT-OSS 20B with Llama.cpp.

15 Comments - Next Page

Page: 1 2 3 4 5 6 Next Page

ARM Linux Server Performance Up More Than 7x Geo Mean In 8 Years, As Much As 15x With NVIDIA Vera CPU

AMD EPYC 8635P "Sorano" Benchmarks: Significant Upgrade Opportunity For EPYC 8004 Servers

Intel Xeon 6+ & Intel Ethernet E835 Launch

AMD Announces Radeon RX 9070 GRE, Ryzen AI Max PRO 400 Series

NVIDIA Vera CPU Benchmarks: Olympus Cores Delivering The Best Performance Ever Seen On ARM

Initial Benchmarks Of The SpacemiT K3 RVA23 RISC-V CPU With The K3 Pico-ITX

Support Phoronix
While Having Ad-Free Browsing,
Single-Page Article Viewing

Legal Disclaimer, Privacy Policy, Cookies | | Contact
Copyright © 2004 - 2026 by Phoronix Media.
All trademarks used are properties of their respective owners. All rights reserved.