mlx-community/gemma-4-12B-it-qat-OptiQ-4bit
Built with mlx-optiq, the MLX-native toolkit to quantize, fine-tune, and serve LLMs locally on Apple Silicon, no PyTorch and no cloud. Try the Lab · All OptIQ quants · Docs
A 4-bit mixed-precision MLX quant produced by mlx-optiq, built on Google's quantization-aware-trained (QAT) Gemma-4 base. OptIQ's sensitivity-guided per-layer bit allocation is applied on top of weights that were trained to survive low-bit quantization, and it still beats a uniform 4-bit quant of the same QAT base by +1.37 Capability Score points.
This is a quant of google/gemma-4-12B-it-qat-q4_0-unquantized. Per-layer bit-widths come from a KL-divergence sensitivity pass on a six-domain calibration mix (prose, reasoning, code, agent, tool-call, constraint-bearing instructions). Sensitive layers go to 8-bit, robust ones stay at 4-bit.
Quantization details
| Property | Value |
|---|---|
| Base | google/gemma-4-12B-it-qat-q4_0-unquantized (QAT) |
| Predominant precision | 4-bit |
| Components at 8-bit (sensitive) | 157 |
| Components at 4-bit (robust) | 171 |
| Total quantized components | 328 |
| Achieved bits-per-weight | 5.25 |
| Group size | 64 |
| Reference for sensitivity | uniform 4-bit (streamed) |
| Calibration mix | six-domain mix |
| Vision | bf16 sidecar (optiq_vision.safetensors), image+text via optiq |
| Speculative drafter | google/gemma-4-12B-it-qat-q4_0-unquantized-assistant via optiq serve --drafter |
Capability Score
Six-metric mean (MMLU, GSM8K, IFEval, BFCL, HumanEval, HashHop), scored against a uniform 4-bit quant of the same QAT base. That comparison isolates what the mixed-precision allocation adds, holding the base fixed.
| Benchmark | Uniform-4 (QAT base) | This model (OptIQ, QAT base) | Delta |
|---|---|---|---|
| MMLU (5-shot, 1000) | 50.9% | 52.5% | +1.6 |
| GSM8K (1000) | 93.1% | 93.3% | +0.2 |
| IFEval (full, strict) | 72.3% | 73.6% | +1.3 |
| BFCL-V3 simple (200) | 72.5% | 72.0% | -0.5 |
| HumanEval (pass@1, 164) | 90.9% | 91.5% | +0.6 |
| HashHop (long-context) | 30.0% | 35.0% | +5.0 |
| Capability Score (mean) | 68.27 | 69.64 | +1.37 |
OptIQ adds +1.37 points over uniform 4-bit on this QAT base, consistent with the margin on the other QAT Gemma-4 sizes (E2B +2.09, E4B +1.19): the per-layer allocation keeps paying off even after QAT has made the weights more quantization-robust. The mixed quant is 5.25 bits-per-weight (about 8.3 GB on disk) versus 4.0 bits-per-weight (about 6.2 GB) for uniform 4-bit, with the extra budget spent on the layers that need it.
Usage
The 12B is the unified Gemma-4 (model_type: gemma4_unified), so it needs mlx-lm from main and import optiq (the unified text tower is not in the 0.31.3 PyPI release; the main build also reports 0.31.3, so install from git, not a version pin):
pip install -U mlx-optiq "mlx-lm @ git+https://github.com/ml-explore/mlx-lm.git"
import optiq # registers the gemma4_unified model type
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/gemma-4-12B-it-qat-OptiQ-4bit")
print(generate(model, tokenizer, "Explain mixed-precision quantization.", max_tokens=256))
Image+text input and the speculative drafter run through mlx-optiq:
pip install mlx-optiq
optiq serve --model mlx-community/gemma-4-12B-it-qat-OptiQ-4bit \
--drafter google/gemma-4-12B-it-qat-q4_0-unquantized-assistant
The language and image+text paths both run through optiq. The bf16 vision tower rides in optiq_vision.safetensors, which mlx-lm ignores (it globs model*.safetensors), so both paths work from one artifact.
- Downloads last month
- 1,073
4-bit
Model tree for mlx-community/gemma-4-12B-it-qat-OptiQ-4bit
Base model
google/gemma-4-12B