VOOZH about

URL: https://www.phoronix.com/review/amd-epyc-9575f-ai-server/2

⇱ AMD EPYC 9575F CPUs For GPU/AI Servers Show Leading Performance In Benchmarks Review - Phoronix


👁 Phoronix

AMD EPYC 9575F CPUs For GPU/AI Servers Show Leading Performance In Benchmarks

Written by Michael Larabel in Processors on 11 September 2025 at 08:30 AM EDT. Page 2 of 4. 10 Comments.

AMD had publicly explored the capabilities of high-frequency Turin CPUs for AI servers earlier this summer within a AMD.com blog post. In particular, the importance of the host processor in enhancing the responsiveness for large language models. Using vLLM they looked at the latency-constrained throughput of Llama 3.3 with 70B parameters in a TP8 configuration with the Sonnet 3.5 dataset when imposing time-to-first token constraints of 300, 400, 500, and 600 ms.

AMD provided me access to their scripts for carrying out those latency-constrained throughput runs so I decided to start off there in looking at the throughput when applying different latency constraints for Llama 3.3 70B running across the eight NVIDIA H100 GPUs on each server.

With a 300 ms and 400 ms latency constraint, the performance of the Intel Xeon Platinum AI server was atrocious. The Xeon server didn't even manage a single goodput token/s with the 300 ms constraint and didn't do much better within a 400 ms constraint either.

At 500ms and 600ms latency constraints the Intel Xeon Platinum 8592+ dual socket server was at least in the same ball field as the AMD EPYC 9575F Turin AI server. But even with the relaxed latency constraint, there was still a clear advantage to the AMD EPYC 9005 series processor as the host CPU for that AI server with eight NVIDIA H100 GPUs. There was typically a 100~200 ms higher time to first token (TTFT) with the Xeon Platinum 8592+ server than with the AMD EPYC 9575F server.