Llama.cpp AI Performance With The GeForce RTX 5090

Written by Michael Larabel in Graphics Cards on 27 January 2025 at 02:33 PM EST. Page 3 of 3. 44 Comments.

👁 Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Text Generation 128. RTX 5090 was the fastest.

When looking at Mistral 7B for its 128 token text generation, it was showing off excellent generational uplift and similar in scope to the Llama 3.1 win... The RTX 5090 managed 1.58x the performance of the RTX 4090.

👁 Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Text Generation 128. RTX 5090 was the fastest.

On a performance-per-Watt basis this $1999 USD graphics card remained comparable to the RTX 4080 / 4090 graphics cards.

👁 Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Text Generation 128. RTX 5090 was the fastest.

The GPU temperatures of this NVIDIA GeForce RTX 5090 Founders Edition graphics card continue to be great for being a dual-slot graphics card and considering its higher power use.

👁 Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Prompt Processing 2048. RTX 5090 was the fastest.

For prompt processing with Mistral 7B, the RTX 5090 was at 1.17x the performance of the RTX 4090.

Let me know by commenting in the forums if interested in seeing more Llama.cpp GPU benchmarks moving forward. Apologies for the brief testing due to only having a NVIDIA RTX 50 Linux driver build for a few days. Thanks to NVIDIA for providing the GeForce RTX 5090 review sample for Linux testing at Phoronix.

44 Comments

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.

Page: 1 2 3

URL: https://www.phoronix.com/review/nvidia-rtx5090-llama-cpp/3

⇱ Llama.cpp AI Performance With The GeForce RTX 5090 Review - Phoronix

Llama.cpp AI Performance With The GeForce RTX 5090