Will It Run AI · Calculator
Tell us what you own and what you want to do. We will rank the local models that make sense.
Start from your hardware and workload, then get a shortlist based on fit, speed, and runtime support instead of guessing from generic model lists or benchmark screenshots.
Live catalog snapshot: 196 hardware profiles, 374 models, 24 runtimes. That keeps the calculator aligned with the current catalog instead of a static benchmark list.
Now evaluating
RTX 4070 12GB
Inputs
Pick the hardware, runtime, and workload you want to test.
Use the detected hardware if it is right, override it if it is not, and rerun the ranking to compare realistic local AI options.
1. Fit
Memory fit and headroom decide whether a model is realistic on the selected hardware.
2. Workload
The score rewards models that match the selected task and penalizes stale or legacy families when newer specialist releases exist.
3. Speed
Decode throughput and TTFT keep the shortlist practical for real usage, not just theoretically possible runs.
Qwen
FrontierReleased Jun 2025Hugging FaceOllamaLM Studio
Why it wins
Qwen 3.5 9B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 32K safe context.
Runtime support: native via GGUF on cuda-local.
All 374 models
Full compatibility grid for RTX 4070 12GB
244 models fit · 9 excellent · 37 great
Grade
Model
Params
Tasks
Q4 VRAM
Decode
Context
Memory
Fit
Weights: 5.5 GB
KV cache: 2.2 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 122.0 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
CodeGeeX
CurrentReleased Jul 2024Hugging FaceOllama
Why it wins
CodeGeeX 4 9B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 116K safe context.
Runtime support: native via GGUF on cpu-gpu-local.
Weights: 5.5 GB
KV cache: 0.6 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 114.6 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Gemma
FrontierReleased Apr 2026Hugging FaceOllamaLM Studio
Why it wins
Gemma 4 E4B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 63K safe context.
Runtime support: native via GGUF on cuda-local.
Weights: 4.9 GB
KV cache: 1.3 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 110.2 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Codestral
CurrentReleased Jul 2024Hugging FaceOllama
Why it wins
Codestral Mamba 7B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 184K safe context.
Runtime support: native via GGUF on cpu-gpu-local.
Weights: 4.3 GB
KV cache: 0.5 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 107.2 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Yi
CurrentReleased Sep 2024Hugging FaceOllamaLM Studio
Why it wins
Yi Coder 9B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 48K safe context.
Runtime support: native via GGUF on cpu-gpu-local.
Weights: 5.5 GB
KV cache: 1.5 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 106.6 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Granite
CurrentReleased Apr 2026Hugging FaceOllama
Why it wins
Granite 4.1 8B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 33K safe context.
Runtime support: native via GGUF on cuda-local.
Weights: 4.9 GB
KV cache: 2.4 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 102.3 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Qwen
CurrentReleased Sep 2024Hugging FaceOllamaLM Studio
Why it wins
Qwen 2.5 Coder 7B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 105K safe context.
Runtime support: native via GGUF on cpu-gpu-local.
Weights: 4.3 GB
KV cache: 0.9 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 101.0 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Qwen
FrontierReleased Apr 2025Hugging FaceOllamaLM Studio
Why it wins
Qwen 3 8B is viable for Coding, but is not the most specialized choice. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 37K safe context.
Runtime support: native via GGUF on cuda-local.
Weights: 4.9 GB
KV cache: 2.2 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 99.6 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Nemotron
FrontierReleased Jun 2025Hugging FaceOllamaLM Studio
Why it wins
Nemotron Nano 9B v2 is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It should run, but memory headroom will be limited. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Tight · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Good · Bottleneck: Balanced
Fit: Tight fit with 29K safe context.
Runtime support: native via GGUF on cpu-gpu-local.
Weights: 5.5 GB
KV cache: 2.4 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 99.4 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.
Qwen
FrontierReleased Jun 2025Hugging FaceOllamaLM Studio
Why it wins
Qwen 3.5 4B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Fit: Runs well with 48K safe context.
Runtime support: native via GGUF on cuda-local.
Weights: 3.3 GB
KV cache: 2.2 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Score 93.6 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.