gemma-4-E4B-it-GGUF

Gemma-4-E4B-it from Google is a 4.5B effective parameter (8B total with Per-Layer Embeddings) multimodal dense model in the Gemma 4 family, optimized for edge deployment on laptops, high-end smartphones, and consumer GPUs with native support for text, images (variable aspect ratio/resolution), audio processing, and configurable thinking modes for step-by-step reasoning. Featuring 42 layers, 512-token sliding window, 128K context length, and 262K vocabulary, it delivers frontier-level performance in agentic workflows, multilingual OCR/handwriting recognition, document/PDF parsing, UI/screen analysis, chart interpretation, object detection with pointing, coding assistance, and low-latency speech-to-text understanding—rivaling models 10-20x larger while maintaining Google's production-grade safety alignments. The instruction-tuned variant excels at on-device autonomous agents via Android AICore/Qualcomm optimizations, with open weights enabling local-first inference (MediaTek/ARM CPUs, NVIDIA RTX) for privacy-focused applications like mobile IDEs, real-time document processing, and structured data extraction in resource-constrained environments.

Model Files

File Name	Quant Type	File Size	File Link
gemma-4-E4B-it.BF16.gguf	BF16	15.1 GB	Download
gemma-4-E4B-it.F16.gguf	F16	15.1 GB	Download
gemma-4-E4B-it.Q2_K.gguf	Q2_K	4.4 GB	Download
gemma-4-E4B-it.Q3_K_L.gguf	Q3_K_L	5.02 GB	Download
gemma-4-E4B-it.Q3_K_M.gguf	Q3_K_M	4.85 GB	Download
gemma-4-E4B-it.Q3_K_S.gguf	Q3_K_S	4.65 GB	Download
gemma-4-E4B-it.Q4_0.gguf	Q4_0	5.19 GB	Download
gemma-4-E4B-it.Q4_K_M.gguf	Q4_K_M	5.34 GB	Download
gemma-4-E4B-it.Q4_K_S.gguf	Q4_K_S	5.2 GB	Download
gemma-4-E4B-it.Q5_0.gguf	Q5_0	5.69 GB	Download
gemma-4-E4B-it.Q5_K_M.gguf	Q5_K_M	5.76 GB	Download
gemma-4-E4B-it.Q5_K_S.gguf	Q5_K_S	5.69 GB	Download
gemma-4-E4B-it.Q6_K.gguf	Q6_K	6.22 GB	Download
gemma-4-E4B-it.Q8_0.gguf	Q8_0	8.01 GB	Download
gemma-4-E4B-it.mmproj-bf16.gguf	mmproj-bf16	992 MB	Download
gemma-4-E4B-it.mmproj-f16.gguf	mmproj-f16	992 MB	Download
gemma-4-E4B-it.mmproj-q8_0.gguf	mmproj-q8_0	560 MB	Download

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

Downloads last month: 1,412

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/gemma-4-E4B-it-GGUF

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Quantized

(241)

this model

Collection including prithivMLmods/gemma-4-E4B-it-GGUF

Collection of gemma 4/4[MoE] GGUF • 4 items • Updated 2 days ago • 2

URL: https://huggingface.co/prithivMLmods/gemma-4-E4B-it-GGUF

⇱ prithivMLmods/gemma-4-E4B-it-GGUF · Hugging Face

gemma-4-E4B-it-GGUF

Model Files

llama.cpp

Model tree for prithivMLmods/gemma-4-E4B-it-GGUF

Collection including prithivMLmods/gemma-4-E4B-it-GGUF