NVIDIA Vera CPU Benchmarks: Olympus Cores Delivering The Best Performance Ever Seen On ARM
👁 NVIDIA Vera test bed for benchmarking
For quantifying the performance of NVIDIA Vera, the following configurations/processors were tested:
NVIDIA Vera - NVIDIA Vera in its full load-out of 88 cores / 176 threads. (Those wondering about SMT on/off comparison, that will come in a separate article) with its 8 x 96GB LPDDR5-9600MT/s memory. The Vera CPU as tested was with a peak 450 Watt TDP.
NVIDIA Grace - The current-generation NVIDIA Grace CPU with its 72 Arm Neoverse-V2 cores and paired with 256GB of LPDDR5-8533 MT/s memory.
2 x AMD EPYC 9455 - This was NVIDIA's recommended comparison point for Vera on the AMD Zen 5 side. Two AMD EPYC 9455 CPUs is the closest core/thread wise to Vera. Each EPYC 9455 is 48 cores / 96 threads with a 3.15GHz base clock and 4.1GHz all-core boost speed and 4.4GHz maximum boost clock. Thus combined at 96 cores / 192 threads compared to Vera's 88 cores / 176 threads. The EPYC 9455 has a 300 Watt TDP. All of the tested AMD EPYC configurations were with 12 (or 24 for dual socket configurations) 64GB DDR5-6400 memory as the peak configuration supported by the AMD EPYC 9005 series.
2 x AMD EPYC 9475F - Also at the 48 core per socket level is the EPYC 9475F as the high frequency version. The AMD EPYC 9475F with its 48 cores / 96 threads per socket has a 3.65GHz base clock, 4.4GHz all-core boost clock, and 4.8GHz maximum boost clock. The EPYC 9475F has a 400 Watt TDP.
1 x AMD EPYC 9575F - AMD's top-recommended EPYC Turin choice for the head CPU on AI servers is the EPYC 9575F. The 64-core / 128-thread high frequency part has a 3.3GHz base clock, 4.5GHz all-core boost clock, and maximum boost clock of 5.0GHz. This is the only AMD EPYC 9005 series part hitting 5.0GHz. The EPYC 9575F is another Turin CPU with a 400 Watt TDP.
2 x AMD EPYC 9575F - This high frequency 64-core part was tested in both 1P and 2P configurations for sitting on both sides of Vera's sole 88 core configuration.
1 x AMD EPYC 9755 - AMD's flagship (non-dense) EPYC 9755 processor with 128 cores / 256 threads. The EPYC 9755 has a 2.7GHz base clock and max boost clock up to 4.1GHz. The AMD EPYC 9755 has a 500 Watt TDP similar to Vera.
2 x AMD EPYC 9755 - The top-configuration AMD EPYC 9005 series 2P setup without going for Turin Dense SKUs for showing top-end to top-end between NVIDIA Vera and AMD EPYC Zen 5.
1 x Intel Xeon 6980P - For Granite Rapids coverage is the lone Xeon 6 P configuration I have for testing, the 128-core Xeon 6980P. The Intel Xeon 6980P with its 128 cores / 256 threads has a 2.0GHz base frequency, 3.2GHz all-core turbo frequency, and 3.9GHz maximum turbo frequency. The Xeon 6980P has a 500 Watt TDP.
2 x Intel Xeon 6980P - Dual socket Granite Rapids for a combined 256 cores / 512 threads. The Xeon Granite Rapids processors were tested using 12/24 channels of MRDIMM-8800 memory.
All tests were on Ubuntu 24.04 LTS while upgrading to the latest GCC 16.1 compiler.
The assortment of single and dual socket Intel/AMD processors was used for a comprehensive look at Vera relative to the competition both in core/thread counts and for helping to show the workloads that are multi-threaded and scaling well or not. Additionally, for some of the workloads that due to NUMA locality can perform worse in multi-socket configurations than a single socket. There are, of course, TCO factors to also consider with the dual socket options such as needing twice the number of memory modules.
The selection was limited by the CPUs on hand, which is why there was only the Xeon 6980P processors tested on the Intel side as the lone Xeon 6 Granite Rapids processor review samples I have. Similarly, with Ampere Computing having needed back their AmpereOne review unit after the initial review/testing, I don't have any current generation Ampere hardware for conducting comparisons. But by going off these numbers relative to EPYC/Xeon, it's easy to say that Vera by far is the most competitive ARM server CPU I have ever tested whether it be bare metal or in the public clouds.
Thanks to NVIDIA for providing this opportunity for being able to run these initial benchmarks on the NVIDIA Vera CPU.
