VOOZH about

URL: https://apxml.com/models/gemma-4-e4b


Gemma 4 E4B

Parameters

8B

Context Length

128K

Modality

Multimodal

Architecture

Dense

License

Apache 2.0

Release Date

2 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

8

Key-Value Heads

2

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

512

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

10,240

Number of Layers

42

FFN Intermediate Size (Dense)

10,240

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

262,144

Architecture Diagram

Gemma 4 E4B

Gemma 4 E4B is an edge-optimized model with 4.5B effective parameters (8B with Per-Layer Embeddings) for mobile and edge deployments. Supports multimodal input (text, image, audio) with 128K context window. Delivers enhanced performance over E2B while maintaining efficient on-device execution. Features thinking mode and native function calling.

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.


Other Gemma 4 Models

Evaluation Benchmarks

No evaluation benchmarks for Gemma 4 E4B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B

68 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs