VOOZH about

URL: https://www.hardware-corner.net/gpu-llm-benchmarks/

⇱ GPU LLM Benchmarks | Hardware Corner


GPU Benchmarks with Local LLMs

Compare real-world local LLM inference performance across different GPUs models by NVIDIA, AMD, and Intel — token generation, prompt processing and context scaling up to 256K.

GPUs Tested 21
Models Tested 20
Test format GGUF
Quantization Q4 to Q8
Context 4k to 256k

All Benchmarked GPUs

GPU VRAM Bandwidth Details
RTX 3060 12GB 12 GB 360 GB/s View Benchmarks →
RTX 3080 Ti 12 GB 912 GB/s View Benchmarks →
RTX 3090 24 GB 986 GB/s View Benchmarks →
RTX 3090 Ti 24 GB 1,008 GB/s View Benchmarks →
RTX 4060 Ti 16GB 16 GB 288 GB/s View Benchmarks →
RTX 4070 12 GB 504 GB/s View Benchmarks →
RTX 4070 SUPER 12 GB 504 GB/s View Benchmarks →
RTX 4070 Ti 12 GB 504 GB/s View Benchmarks →
RTX 4070 Ti SUPER 16 GB 672 GB/s View Benchmarks →
RTX 4080 16 GB 716 GB/s View Benchmarks →
RTX 4080 SUPER 16 GB 736 GB/s View Benchmarks →
RTX 4090 24 GB 1,008 GB/s View Benchmarks →
RTX 5060 Ti 16GB 16 GB 448 GB/s View Benchmarks →
RTX 5070 12 GB 672 GB/s View Benchmarks →
RTX 5070 Ti 16 GB 896 GB/s View Benchmarks →
RTX 5080 16 GB 960 GB/s View Benchmarks →
RTX 5090 32 GB 1,792 GB/s View Benchmarks →
RTX 6000 Ada 48 GB 960 GB/s View Benchmarks →
RTX A6000 48 GB 768 GB/s View Benchmarks →
RTX Pro 5000 Blackwell 48 GB 1,340 GB/s View Benchmarks →
RTX Pro 6000 Blackwell 96 GB 1,790 GB/s View Benchmarks →
Best Value

RTX 3090

Best Value Model

Excellent 24GB VRAM performance per dollar

Best Local LLM GPU

RTX Pro 6000

Maximum Performance

Workstation-class LLM throughput

Best 16GB GPU

RTX 5060 Ti

Efficient 16GB Choice

Balanced VRAM & bandwidth

Benchmarked Models

Model

8B – Qwen3

Model

14B – Qwen3

Model

20B – GPT-OSS

Model

27B – Qwen3.5

Model

30B – Qwen3

Model

35B – Qwen3.5