![]() |
VOOZH | about |
Active Parameters
744B
Context Length
205K
Modality
Multimodal
Architecture
Mixture of Experts (MoE)
License
MIT
Release Date
12 Feb 2026
Knowledge Cutoff
Dec 2025
Attention
Attention Structure
Multi-Head Attention
Attention Heads
64
Key-Value Heads
64
Attention Head Dimension
64
Position Embedding
Absolute Position Embedding
RoPE Theta
1,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
Swish
Dimensions
Hidden Dimension Size
6,144
Number of Layers
80
FFN Intermediate Size (Dense)
2,048
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
154,880
Mixture of Experts
Total Expert Parameters
40.0B
Number of Experts
256
Active Experts
8
Shared Experts
1
FFN Intermediate Size (per Expert)
2,048
Dense Layers Before MoE
3
GLM-5 is a flagship multimodal foundation model developed by Z.ai, designed for complex systems engineering and long-horizon agentic workflows. Utilizing a Mixture-of-Experts (MoE) architecture, the model scales to 744 billion total parameters with approximately 40 billion parameters activated per token. This design facilitates high-capacity reasoning and specialized knowledge retrieval while maintaining the computational efficiency required for large-scale deployment. The model is trained on a massive 28.5 trillion token corpus, emphasizing high-quality code, technical documentation, and reasoning-dense data to support professional-grade software development and autonomous problem-solving.
Technically, GLM-5 introduces several architectural innovations, most notably the integration of DeepSeek Sparse Attention (DSA). This mechanism optimizes the standard attention block by dynamically allocating computational resources, which significantly reduces the memory and compute overhead associated with processing long sequences. Additionally, the model leverages an asynchronous reinforcement learning infrastructure known as 'slime' during post-training. This framework decouples generation from training to improve iteration throughput, allowing the model to learn effectively from complex, multi-step interactions and dynamic environments.
Optimized for long-context stability, GLM-5 supports a context window of up to 204,800 tokens and is capable of generating up to 128,000 tokens in a single output. Its operational capabilities include advanced tool-use, real-time streaming, and structured output across frontend, backend, and data processing tasks. The model is released with open weights under the MIT License, enabling researchers and developers to perform local serving, fine-tuning, and integration into diverse agentic frameworks without vendor lock-in.
GLM 5 is the fifth generation of General Language Models developed by Z.ai. It represents a significant leap in multimodal foundational capabilities, featuring advanced reasoning and long-horizon agentic capabilities across diverse systems engineering tasks.
Rank
#32
| Benchmark | Score | Rank |
|---|---|---|
Agentic Coding LiveBench Agentic | 0.55 | 11 |
Data Analysis LiveBench Data Analysis | 0.68 | 16 |
Professional Knowledge MMLU Pro | 0.86 | 16 |
Mathematics LiveBench Mathematics | 0.83 | 17 |
General Text Text Arena | 1457 | 19 |
StackUnseen ProLLM Stack Unseen | 0.551 | 21 |
Web Development WebDev Arena | 1435 | 25 |
Coding LiveBench Coding | 0.74 | 27 |
Reasoning LiveBench Reasoning | 0.69 | 28 |
Overall Rank
#32
Coding Rank
#53
Total Score
B+
79 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online