VOOZH about

URL: https://apxml.com/models/gemma-4-26b-a4b

⇱ Gemma 4 26B A4B: Specifications and GPU VRAM Requirements


Gemma 4 26B A4B

Active Parameters

25.2B

Context Length

256K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

2 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

16

Key-Value Heads

8

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

1,024

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

2,112

Number of Layers

30

FFN Intermediate Size (Dense)

704

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

262,144

Mixture of Experts

Total Expert Parameters

3.8B

Number of Experts

128

Active Experts

8

Shared Experts

-

FFN Intermediate Size (per Expert)

704

Dense Layers Before MoE

-

Architecture Diagram

Gemma 4 26B A4B

Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total parameters but only 3.8B active per inference, achieving the speed of a 4B model with near-31B performance. Features 128 experts (8 active) with 256K context window, supporting text and image input. Optimized for fast inference on consumer GPUs while delivering frontier-level reasoning and coding capabilities.

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.


Other Gemma 4 Models

Evaluation Benchmarks

Rank

#40

BenchmarkScoreRank

General Text

Text Arena

1438

37

Web Development

WebDev Arena

1360

54

Rankings

Overall Rank

#40

Coding Rank

#63

Model Integrity

Total Score

B

70 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs