gemma-4-12B-it-qat-GGUF

gemma-4-12B-it-qat-q4_0-unquantized is a 12-billion-parameter instruction-tuned vision-language model from Google DeepMind, part of the Gemma 4 family, optimized using Quantization-Aware Training (QAT) to preserve bfloat16-level quality while significantly reducing memory requirements. It features a unified encoder-free architecture that projects raw image patches and audio waveforms directly into the LLM's embedding space, supports text, image, and audio modalities, and offers a 256K token context window. The model employs a hybrid attention mechanism interleaving local sliding window and full global attention, with multilingual support across 140+ languages, native function calling, configurable thinking/reasoning mode, and achieves strong benchmark scores including 77.2% on MMLU Pro, 78.8% on GPQA Diamond, and 77.5% on AIME 2026. The Q4_0 unquantized variant specifically refers to half-precision weights extracted from the QAT pipeline, making it ideal for custom downstream compilation and research rather than direct deployment.

Google DeepMind’s Gemma 4 Quantization-Aware Training (QAT) releases compress models by simulating lower precision during the training process itself. This drastically reduces VRAM requirements and accelerates local inference on consumer hardware and mobile devices while preserving the near-original quality of uncompressed baselines.

Model Files

File Name	Quant Type	File Size	File Link
gemma-4-12B-it-qat-q4_0-unquantized.BF16.gguf	BF16	23.8 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.F16.gguf	F16	23.8 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q2_K.gguf	Q2_K	4.83 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q3_K_L.gguf	Q3_K_L	6.57 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q3_K_M.gguf	Q3_K_M	6.09 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q3_K_S.gguf	Q3_K_S	5.53 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q4_0.gguf	Q4_0	6.98 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q4_K_M.gguf	Q4_K_M	7.38 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q4_K_S.gguf	Q4_K_S	7.02 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q5_0.gguf	Q5_0	8.34 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q5_K_M.gguf	Q5_K_M	8.55 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q5_K_S.gguf	Q5_K_S	8.34 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q6_K.gguf	Q6_K	9.79 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.Q8_0.gguf	Q8_0	12.7 GB	Download
gemma-4-12B-it-qat-q4_0-unquantized.mmproj-bf16.gguf	mmproj-bf16	175 MB	Download
gemma-4-12B-it-qat-q4_0-unquantized.mmproj-f16.gguf	mmproj-f16	175 MB	Download
gemma-4-12B-it-qat-q4_0-unquantized.mmproj-q8_0.gguf	mmproj-q8_0	159 MB	Download