VOOZH about

URL: https://huggingface.co/prithivMLmods/gemma-4-26B-A4B-it-qat-GGUF

⇱ prithivMLmods/gemma-4-26B-A4B-it-qat-GGUF · Hugging Face


gemma-4-26B-A4B-it-qat-GGUF

google/gemma-4-26B-A4B-it-qat-q4_0-unquantized is a Mixture-of-Experts (MoE) instruction-tuned multimodal model from Google DeepMind, part of the Gemma 4 family, featuring 25.2 billion total parameters but only 3.8 billion active parameters during inference, optimized via Quantization-Aware Training (QAT) to preserve near-bfloat16 quality at significantly reduced memory requirements. Its sparse MoE architecture activates just 4B of its 128 total experts (plus 1 shared) per token across 30 layers with a 1024-token sliding window, making it run nearly as fast as a dedicated 4B model while delivering quality competitive with much larger dense models — scoring an impressive 82.6% on MMLU Pro, 82.3% on GPQA Diamond, 88.3% on AIME 2026, 77.1% on LiveCodeBench v6, 73.8% on MMMU Pro (vision), and 44.1% on the 256K long-context MRCR v2 task. Supporting text and image modalities (no audio) with a 256K token context window, a ~550M parameter vision encoder, and a 262K vocabulary across 140+ languages, the model enables image understanding, OCR, video frame analysis, native function calling, and configurable thinking/reasoning mode, with the Q4_0 unquantized variant providing half-precision weights extracted from the QAT pipeline, making it ideal for custom downstream compilation and research targeting high-throughput, cost-efficient server-side deployment.

Google DeepMind’s Gemma 4 Quantization-Aware Training (QAT) releases compress models by simulating lower precision during the training process itself. This drastically reduces VRAM requirements and accelerates local inference on consumer hardware and mobile devices while preserving the near-original quality of uncompressed baselines.

Model Files

File Name Quant Type File Size File Link
gemma-4-26B-A4B-it-qat.BF16.gguf BF16 50.5 GB Download
gemma-4-26B-A4B-it-qat.F16.gguf F16 50.5 GB Download
gemma-4-26B-A4B-it-qat.F32.gguf F32 101 GB Download
gemma-4-26B-A4B-it-qat.Q2_K.gguf Q2_K 10.6 GB Download
gemma-4-26B-A4B-it-qat.Q3_K_L.gguf Q3_K_L 13.8 GB Download
gemma-4-26B-A4B-it-qat.Q3_K_M.gguf Q3_K_M 13.3 GB Download
gemma-4-26B-A4B-it-qat.Q3_K_S.gguf Q3_K_S 12.2 GB Download
gemma-4-26B-A4B-it-qat.Q4_0.gguf Q4_0 14.4 GB Download
gemma-4-26B-A4B-it-qat.Q4_K_M.gguf Q4_K_M 16.8 GB Download
gemma-4-26B-A4B-it-qat.Q4_K_S.gguf Q4_K_S 15.5 GB Download
gemma-4-26B-A4B-it-qat.Q5_0.gguf Q5_0 17.5 GB Download
gemma-4-26B-A4B-it-qat.Q5_K_M.gguf Q5_K_M 19.1 GB Download
gemma-4-26B-A4B-it-qat.Q5_K_S.gguf Q5_K_S 18 GB Download
gemma-4-26B-A4B-it-qat.Q6_K.gguf Q6_K 22.6 GB Download
gemma-4-26B-A4B-it-qat.Q8_0.gguf Q8_0 26.9 GB Download
gemma-4-26B-A4B-it-qat.mmproj-bf16.gguf mmproj-bf16 1.19 GB Download
gemma-4-26B-A4B-it-qat.mmproj-f16.gguf mmproj-f16 1.19 GB Download
gemma-4-26B-A4B-it-qat.mmproj-f32.gguf mmproj-f32 2.29 GB Download
gemma-4-26B-A4B-it-qat.mmproj-q8_0.gguf mmproj-q8_0 806 MB Download

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

Downloads last month
2,750
GGUF
Model size
25B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Model tree for prithivMLmods/gemma-4-26B-A4B-it-qat-GGUF

Collection including prithivMLmods/gemma-4-26B-A4B-it-qat-GGUF