gemma-4-12B-it-GGUF
google/gemma-4-12B-it from Google DeepMind is a 12B-parameter unified, encoder-free multimodal model released June 3, 2026, as the first mid-sized Gemma 4 to feature native audio inputs alongside text, image, and video, processing all modalities by flowing directly into the LLM backbone without separate vision/audio encoders for reduced latency and memory footprint. It delivers performance nearing Gemma 4 26B MoE on standard benchmarks while requiring less than half the total memory (~16GB VRAM or unified memory), making it laptop-ready for consumer hardware with 16GB RAM, Multi-Token Prediction (MTP) drafters for lower latency, and strong agentic reasoning for multi-step workflows. Released under Apache 2.0 with support across the developer ecosystem (Ollama, vLLM, LM Studio), Gemma 4 12B excels at real-time audio/visual understanding, image analysis, content categorization, context compression, and local-first AI applications without API dependency.
Model Files
| File Name | Quant Type | File Size | File Link |
|---|---|---|---|
| gemma-4-12B-it.BF16.gguf | BF16 | 23.8 GB | Download |
| gemma-4-12B-it.F16.gguf | F16 | 23.8 GB | Download |
| gemma-4-12B-it.Q2_K.gguf | Q2_K | 4.83 GB | Download |
| gemma-4-12B-it.Q3_K_L.gguf | Q3_K_L | 6.57 GB | Download |
| gemma-4-12B-it.Q3_K_M.gguf | Q3_K_M | 6.09 GB | Download |
| gemma-4-12B-it.Q3_K_S.gguf | Q3_K_S | 5.53 GB | Download |
| gemma-4-12B-it.Q4_0.gguf | Q4_0 | 6.98 GB | Download |
| gemma-4-12B-it.Q4_K_M.gguf | Q4_K_M | 7.38 GB | Download |
| gemma-4-12B-it.Q4_K_S.gguf | Q4_K_S | 7.02 GB | Download |
| gemma-4-12B-it.Q5_0.gguf | Q5_0 | 8.34 GB | Download |
| gemma-4-12B-it.Q5_K_M.gguf | Q5_K_M | 8.55 GB | Download |
| gemma-4-12B-it.Q5_K_S.gguf | Q5_K_S | 8.34 GB | Download |
| gemma-4-12B-it.Q6_K.gguf | Q6_K | 9.79 GB | Download |
| gemma-4-12B-it.Q8_0.gguf | Q8_0 | 12.7 GB | Download |
| gemma-4-12B-it.mmproj-bf16.gguf | mmproj-bf16 | 175 MB | Download |
| gemma-4-12B-it.mmproj-f16.gguf | mmproj-f16 | 175 MB | Download |
| gemma-4-12B-it.mmproj-q8_0.gguf | mmproj-q8_0 | 159 MB | Download |
llama.cpp
LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp
- Downloads last month
- 1,473
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
