VOOZH about

URL: https://apxml.com/models/glm-51


GLM-5.1

Active Parameters

754B

Context Length

200K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

7 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Layer Attention

Attention Heads

64

Key-Value Heads

64

Attention Head Dimension

64

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

6,144

Number of Layers

78

FFN Intermediate Size (Dense)

2,048

Multi-Token Prediction Heads

1

Tokenizer

Vocabulary Size

154,880

Mixture of Experts

Total Expert Parameters

40.0B

Number of Experts

257

Active Experts

9

Shared Experts

1

FFN Intermediate Size (per Expert)

2,048

Dense Layers Before MoE

3

Architecture Diagram

GLM-5.1

GLM-5.1 is Z.ai's flagship model for long-horizon agentic coding tasks. Built on a novel GlmMoeDSA architecture with 754B total parameters (256 routed + 1 shared experts, 8+1 active per token) across 78 layers, it combines Gated DeltaNet linear attention with standard attention and sparse MoE feed-forward networks — enabling efficient inference while delivering top-tier intelligence. Achieves state-of-the-art 58.4% on SWE-Bench Pro, 63.5% on Terminal-Bench 2.0, 95.3% on AIME 2026, and 86.2% on GPQA-Diamond. Uniquely designed for 8-hour sustained autonomous execution — breaking complex engineering tasks into iterative experiment-analyze-optimize loops. Supports 200K context window and 128K max output tokens. Available via API as glm-5.1 on Z.ai and BigModel.cn. Released April 7, 2026 under MIT license.

About GLM-5.1

GLM-5.1 is Z.ai's next-generation flagship model for agentic engineering, built on a novel hybrid MoE architecture (GlmMoeDSA) combining Gated DeltaNet linear attention layers with standard attention and sparse MoE feed-forward networks. It achieves state-of-the-art performance on SWE-Bench Pro (58.4%) and is designed for long-horizon autonomous tasks, capable of sustained execution for up to 8 hours. With 754B total parameters and a 200K context window, GLM-5.1 delivers strong performance across coding, reasoning, tool use, and agentic benchmarks. Released open-source under the MIT License.


Other GLM-5.1 Models
  • No related models available

Evaluation Benchmarks

Rank

#5

BenchmarkScoreRank

Web Development

WebDev Arena

1532

7

General Text

Text Arena

1475

7

Rankings

Overall Rank

#5

Coding Rank

#18

Model Integrity

Total Score

B

68 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
98k
195k

VRAM Required:

Recommended GPUs