gemma-4-12B-it-GGUF

google/gemma-4-12B-it from Google DeepMind is a 12B-parameter unified, encoder-free multimodal model released June 3, 2026, as the first mid-sized Gemma 4 to feature native audio inputs alongside text, image, and video, processing all modalities by flowing directly into the LLM backbone without separate vision/audio encoders for reduced latency and memory footprint. It delivers performance nearing Gemma 4 26B MoE on standard benchmarks while requiring less than half the total memory (~16GB VRAM or unified memory), making it laptop-ready for consumer hardware with 16GB RAM, Multi-Token Prediction (MTP) drafters for lower latency, and strong agentic reasoning for multi-step workflows. Released under Apache 2.0 with support across the developer ecosystem (Ollama, vLLM, LM Studio), Gemma 4 12B excels at real-time audio/visual understanding, image analysis, content categorization, context compression, and local-first AI applications without API dependency.

Model Files

File Name	Quant Type	File Size	File Link
gemma-4-12B-it.BF16.gguf	BF16	23.8 GB	Download
gemma-4-12B-it.F16.gguf	F16	23.8 GB	Download
gemma-4-12B-it.Q2_K.gguf	Q2_K	4.83 GB	Download
gemma-4-12B-it.Q3_K_L.gguf	Q3_K_L	6.57 GB	Download
gemma-4-12B-it.Q3_K_M.gguf	Q3_K_M	6.09 GB	Download
gemma-4-12B-it.Q3_K_S.gguf	Q3_K_S	5.53 GB	Download
gemma-4-12B-it.Q4_0.gguf	Q4_0	6.98 GB	Download
gemma-4-12B-it.Q4_K_M.gguf	Q4_K_M	7.38 GB	Download
gemma-4-12B-it.Q4_K_S.gguf	Q4_K_S	7.02 GB	Download
gemma-4-12B-it.Q5_0.gguf	Q5_0	8.34 GB	Download
gemma-4-12B-it.Q5_K_M.gguf	Q5_K_M	8.55 GB	Download
gemma-4-12B-it.Q5_K_S.gguf	Q5_K_S	8.34 GB	Download
gemma-4-12B-it.Q6_K.gguf	Q6_K	9.79 GB	Download
gemma-4-12B-it.Q8_0.gguf	Q8_0	12.7 GB	Download
gemma-4-12B-it.mmproj-bf16.gguf	mmproj-bf16	175 MB	Download
gemma-4-12B-it.mmproj-f16.gguf	mmproj-f16	175 MB	Download
gemma-4-12B-it.mmproj-q8_0.gguf	mmproj-q8_0	159 MB	Download

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

Downloads last month: 1,473

GGUF

Model size

12B params

Architecture

gemma4

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/gemma-4-12B-it-GGUF

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Quantized

(165)

this model

Collection including prithivMLmods/gemma-4-12B-it-GGUF

Collection of gemma 4/4[MoE] GGUF • 4 items • Updated 2 days ago • 2

URL: https://huggingface.co/prithivMLmods/gemma-4-12B-it-GGUF

⇱ prithivMLmods/gemma-4-12B-it-GGUF · Hugging Face

gemma-4-12B-it-GGUF

Model Files

llama.cpp

Model tree for prithivMLmods/gemma-4-12B-it-GGUF

Collection including prithivMLmods/gemma-4-12B-it-GGUF