GPU Benchmarks with Local LLMs

Compare real-world local LLM inference performance across different GPUs models by NVIDIA, AMD, and Intel — token generation, prompt processing and context scaling up to 256K.

GPUs Tested 21

Models Tested 20

Test format GGUF

Quantization Q4 to Q8

Context 4k to 256k

All Benchmarked GPUs

GPU	VRAM	Bandwidth	Details
RTX 3060 12GB	12 GB	360 GB/s	View Benchmarks →
RTX 3080 Ti	12 GB	912 GB/s	View Benchmarks →
RTX 3090	24 GB	986 GB/s	View Benchmarks →
RTX 3090 Ti	24 GB	1,008 GB/s	View Benchmarks →
RTX 4060 Ti 16GB	16 GB	288 GB/s	View Benchmarks →
RTX 4070	12 GB	504 GB/s	View Benchmarks →
RTX 4070 SUPER	12 GB	504 GB/s	View Benchmarks →
RTX 4070 Ti	12 GB	504 GB/s	View Benchmarks →
RTX 4070 Ti SUPER	16 GB	672 GB/s	View Benchmarks →
RTX 4080	16 GB	716 GB/s	View Benchmarks →
RTX 4080 SUPER	16 GB	736 GB/s	View Benchmarks →
RTX 4090	24 GB	1,008 GB/s	View Benchmarks →
RTX 5060 Ti 16GB	16 GB	448 GB/s	View Benchmarks →
RTX 5070	12 GB	672 GB/s	View Benchmarks →
RTX 5070 Ti	16 GB	896 GB/s	View Benchmarks →
RTX 5080	16 GB	960 GB/s	View Benchmarks →
RTX 5090	32 GB	1,792 GB/s	View Benchmarks →
RTX 6000 Ada	48 GB	960 GB/s	View Benchmarks →
RTX A6000	48 GB	768 GB/s	View Benchmarks →
RTX Pro 5000 Blackwell	48 GB	1,340 GB/s	View Benchmarks →
RTX Pro 6000 Blackwell	96 GB	1,790 GB/s	View Benchmarks →

Best Value

RTX 3090

Best Value Model

Excellent 24GB VRAM performance per dollar

Best Local LLM GPU

RTX Pro 6000

Maximum Performance

Workstation-class LLM throughput

Best 16GB GPU

RTX 5060 Ti

Efficient 16GB Choice

Balanced VRAM & bandwidth

Benchmarked Models

Model

8B – Qwen3

Model

14B – Qwen3

Model

20B – GPT-OSS

Model

27B – Qwen3.5

Model

30B – Qwen3

Model

35B – Qwen3.5

URL: https://www.hardware-corner.net/gpu-llm-benchmarks/

⇱ GPU LLM Benchmarks | Hardware Corner

GPU Benchmarks with Local LLMs

All Benchmarked GPUs

Best Value Model

Maximum Performance

Efficient 16GB Choice

Benchmarked Models