VOOZH about

URL: https://apxml.com/models/gemma-4-31b

⇱ Gemma 4 31B: Specifications and GPU VRAM Requirements


Gemma 4 31B

Parameters

30.7B

Context Length

256K

Modality

Multimodal

Architecture

Dense

License

Apache 2.0

Release Date

2 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

32

Key-Value Heads

16

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

Yes

Sliding Window Size

1,024

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

21,504

Number of Layers

60

FFN Intermediate Size (Dense)

21,504

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

262,144

Architecture Diagram

Gemma 4 31B

Gemma 4 31B is the flagship dense model with 30.7B parameters and 256K context window, delivering frontier intelligence for workstations and consumer GPUs. Supports text and image input with state-of-the-art performance on coding, reasoning, and multimodal understanding. Features configurable thinking mode and native function calling for advanced agentic workflows and IDE integration.

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.


Other Gemma 4 Models

Evaluation Benchmarks

Rank

#87

BenchmarkScoreRank

0.59

24

General Text

Text Arena

1451

25

Agentic Coding

LiveBench Agentic

0.40

31

0.74

32

0.59

37

Web Development

WebDev Arena

1377

48

0.60

52

Rankings

Overall Rank

#87

Coding Rank

#124

Model Integrity

Total Score

B

67 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs