![]() |
VOOZH | about |
Compare real-world local LLM inference performance across different GPUs models by NVIDIA, AMD, and Intel — token generation, prompt processing and context scaling up to 256K.
| GPU | VRAM | Bandwidth | Details |
|---|---|---|---|
| RTX 3060 12GB | 12 GB | 360 GB/s | View Benchmarks → |
| RTX 3080 Ti | 12 GB | 912 GB/s | View Benchmarks → |
| RTX 3090 | 24 GB | 986 GB/s | View Benchmarks → |
| RTX 3090 Ti | 24 GB | 1,008 GB/s | View Benchmarks → |
| RTX 4060 Ti 16GB | 16 GB | 288 GB/s | View Benchmarks → |
| RTX 4070 | 12 GB | 504 GB/s | View Benchmarks → |
| RTX 4070 SUPER | 12 GB | 504 GB/s | View Benchmarks → |
| RTX 4070 Ti | 12 GB | 504 GB/s | View Benchmarks → |
| RTX 4070 Ti SUPER | 16 GB | 672 GB/s | View Benchmarks → |
| RTX 4080 | 16 GB | 716 GB/s | View Benchmarks → |
| RTX 4080 SUPER | 16 GB | 736 GB/s | View Benchmarks → |
| RTX 4090 | 24 GB | 1,008 GB/s | View Benchmarks → |
| RTX 5060 Ti 16GB | 16 GB | 448 GB/s | View Benchmarks → |
| RTX 5070 | 12 GB | 672 GB/s | View Benchmarks → |
| RTX 5070 Ti | 16 GB | 896 GB/s | View Benchmarks → |
| RTX 5080 | 16 GB | 960 GB/s | View Benchmarks → |
| RTX 5090 | 32 GB | 1,792 GB/s | View Benchmarks → |
| RTX 6000 Ada | 48 GB | 960 GB/s | View Benchmarks → |
| RTX A6000 | 48 GB | 768 GB/s | View Benchmarks → |
| RTX Pro 5000 Blackwell | 48 GB | 1,340 GB/s | View Benchmarks → |
| RTX Pro 6000 Blackwell | 96 GB | 1,790 GB/s | View Benchmarks → |
RTX 3090
Excellent 24GB VRAM performance per dollar
RTX Pro 6000
Workstation-class LLM throughput
RTX 5060 Ti
Balanced VRAM & bandwidth
8B – Qwen3
14B – Qwen3
20B – GPT-OSS
27B – Qwen3.5
30B – Qwen3
35B – Qwen3.5