GLM-5

Active Parameters

744B

Context Length

205K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

12 Feb 2026

Knowledge Cutoff

Dec 2025

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

Absolute Position Embedding

RoPE Theta

1,000,000

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

6,144

Number of Layers

FFN Intermediate Size (Dense)

2,048

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

154,880

Mixture of Experts

Total Expert Parameters

40.0B

Number of Experts

256

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

2,048

Dense Layers Before MoE

Architecture Diagram

GLM-5

GLM-5 is a flagship multimodal foundation model developed by Z.ai, designed for complex systems engineering and long-horizon agentic workflows. Utilizing a Mixture-of-Experts (MoE) architecture, the model scales to 744 billion total parameters with approximately 40 billion parameters activated per token. This design facilitates high-capacity reasoning and specialized knowledge retrieval while maintaining the computational efficiency required for large-scale deployment. The model is trained on a massive 28.5 trillion token corpus, emphasizing high-quality code, technical documentation, and reasoning-dense data to support professional-grade software development and autonomous problem-solving.

Technically, GLM-5 introduces several architectural innovations, most notably the integration of DeepSeek Sparse Attention (DSA). This mechanism optimizes the standard attention block by dynamically allocating computational resources, which significantly reduces the memory and compute overhead associated with processing long sequences. Additionally, the model leverages an asynchronous reinforcement learning infrastructure known as 'slime' during post-training. This framework decouples generation from training to improve iteration throughput, allowing the model to learn effectively from complex, multi-step interactions and dynamic environments.

Optimized for long-context stability, GLM-5 supports a context window of up to 204,800 tokens and is capable of generating up to 128,000 tokens in a single output. Its operational capabilities include advanced tool-use, real-time streaming, and structured output across frontend, backend, and data processing tasks. The model is released with open weights under the MIT License, enabling researchers and developers to perform local serving, fine-tuning, and integration into diverse agentic frameworks without vendor lock-in.

About GLM 5

GLM 5 is the fifth generation of General Language Models developed by Z.ai. It represents a significant leap in multimodal foundational capabilities, featuring advanced reasoning and long-horizon agentic capabilities across diverse systems engineering tasks.

Other GLM 5 Models

No related models available

Evaluation Benchmarks

Rank

#32

Benchmark	Score	Rank
Agentic Coding LiveBench Agentic	0.55	11
Data Analysis LiveBench Data Analysis	0.68	16
Professional Knowledge MMLU Pro	0.86	16
Mathematics LiveBench Mathematics	0.83	17
General Text Text Arena	1457	19
StackUnseen ProLLM Stack Unseen	0.551	21
Web Development WebDev Arena	1435	25
Coding LiveBench Coding	0.74	27
Reasoning LiveBench Reasoning	0.69	28

Rankings

Overall Rank

#32

Coding Rank

#53

Model Integrity

Total Score

B+

79 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

100k

200k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/glm-5

⇱ GLM-5: Specifications and GPU VRAM Requirements

GLM-5

Technical Specifications

Architecture Diagram

GLM-5

About GLM 5

Other GLM 5 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources