The Massive AI Performance Benefit With AMX On Intel Xeon 6 "Granite Rapids"
Written by Michael Larabel in Processors on 24 September 2025 at 10:20 AM EDT. Page 4 of 6. 15 Comments.
For Llama.cpp too, Advanced Matrix Extensions (AMX) continued to prove to be a massive benefit for faster prompt processing with large language models like Qwen3. There was higher peak server power consumption with AMX but still on a performance-per-Watt basis it pays off.
The Intel AMX numbers for Granite Rapids remained very impressive when running GPT-OSS 20B with Llama.cpp.
