![]() |
VOOZH | about |
Parameters
5.1B
Context Length
128K
Modality
Multimodal
Architecture
Dense
License
Apache 2.0
Release Date
2 Apr 2026
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
8
Key-Value Heads
1
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
10,000
Sliding Window Attention
Yes
Sliding Window Size
512
Normalization
RMS Normalization
Activation Function
GELU
Dimensions
Hidden Dimension Size
6,144
Number of Layers
35
FFN Intermediate Size (Dense)
6,144
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
262,144
Gemma 4 E2B is an ultra-efficient model with 2.3B effective parameters (5.1B with Per-Layer Embeddings) designed for mobile and IoT devices. Supports text, image, and audio input with 128K context window, delivering frontier capabilities on edge devices with near-zero latency and offline operation. Features built-in reasoning mode and native function calling for agentic workflows.
Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.
No evaluation benchmarks for Gemma 4 E2B available.
Overall Rank
-
Coding Rank
-
Total Score
B
66 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online