![]() |
VOOZH | about |
Active Parameters
25.2B
Context Length
256K
Modality
Multimodal
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
2 Apr 2026
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
16
Key-Value Heads
8
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
10,000
Sliding Window Attention
Yes
Sliding Window Size
1,024
Normalization
RMS Normalization
Activation Function
GELU
Dimensions
Hidden Dimension Size
2,112
Number of Layers
30
FFN Intermediate Size (Dense)
704
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
262,144
Mixture of Experts
Total Expert Parameters
3.8B
Number of Experts
128
Active Experts
8
Shared Experts
-
FFN Intermediate Size (per Expert)
704
Dense Layers Before MoE
-
Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total parameters but only 3.8B active per inference, achieving the speed of a 4B model with near-31B performance. Features 128 experts (8 active) with 256K context window, supporting text and image input. Optimized for fast inference on consumer GPUs while delivering frontier-level reasoning and coding capabilities.
Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.
Rank
#40
| Benchmark | Score | Rank |
|---|---|---|
General Text Text Arena | 1438 | 37 |
Web Development WebDev Arena | 1360 | 54 |
Overall Rank
#40
Coding Rank
#63
Total Score
B
70 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online