GLM-5.1

Active Parameters

754B

Context Length

200K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

7 Apr 2026

Knowledge Cutoff

Technical Specifications

Attention

Attention Structure

Multi-Layer Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

6,144

Number of Layers

FFN Intermediate Size (Dense)

2,048

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

154,880

Mixture of Experts

Total Expert Parameters

40.0B

Number of Experts

257

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

2,048

Dense Layers Before MoE

Architecture Diagram

GLM-5.1

GLM-5.1 is Z.ai's flagship model for long-horizon agentic coding tasks. Built on a novel GlmMoeDSA architecture with 754B total parameters (256 routed + 1 shared experts, 8+1 active per token) across 78 layers, it combines Gated DeltaNet linear attention with standard attention and sparse MoE feed-forward networks — enabling efficient inference while delivering top-tier intelligence. Achieves state-of-the-art 58.4% on SWE-Bench Pro, 63.5% on Terminal-Bench 2.0, 95.3% on AIME 2026, and 86.2% on GPQA-Diamond. Uniquely designed for 8-hour sustained autonomous execution — breaking complex engineering tasks into iterative experiment-analyze-optimize loops. Supports 200K context window and 128K max output tokens. Available via API as glm-5.1 on Z.ai and BigModel.cn. Released April 7, 2026 under MIT license.

About GLM-5.1

GLM-5.1 is Z.ai's next-generation flagship model for agentic engineering, built on a novel hybrid MoE architecture (GlmMoeDSA) combining Gated DeltaNet linear attention layers with standard attention and sparse MoE feed-forward networks. It achieves state-of-the-art performance on SWE-Bench Pro (58.4%) and is designed for long-horizon autonomous tasks, capable of sustained execution for up to 8 hours. With 754B total parameters and a 200K context window, GLM-5.1 delivers strong performance across coding, reasoning, tool use, and agentic benchmarks. Released open-source under the MIT License.

Other GLM-5.1 Models

No related models available

Evaluation Benchmarks

Rank

Benchmark	Score	Rank
Web Development WebDev Arena	1532	⭐ 7
General Text Text Arena	1475	⭐ 7

Rankings

Overall Rank

Coding Rank

#18

Model Integrity

Total Score

68 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

98k

195k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Download Weights Source Code

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/glm-51

⇱

GLM-5.1

Technical Specifications

Architecture Diagram

GLM-5.1

About GLM-5.1

Other GLM-5.1 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources