![]() |
VOOZH | about |
Active Parameters
355B
Context Length
128K
Modality
Multimodal
Architecture
Mixture of Experts (MoE)
License
MIT License
Release Date
28 Jul 2025
Knowledge Cutoff
Jan 2025
Attention
Attention Structure
Multi-Head Attention
Attention Heads
96
Key-Value Heads
8
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
1,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
5,120
Number of Layers
96
FFN Intermediate Size (Dense)
1,536
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
151,552
Mixture of Experts
Total Expert Parameters
32.0B
Number of Experts
160
Active Experts
8
Shared Experts
1
FFN Intermediate Size (per Expert)
1,536
Dense Layers Before MoE
3
GLM-4.5 is a flagship multimodal large language model developed by Z.ai that integrates complex reasoning, software engineering, and agentic capabilities within a unified architecture. It employs a sophisticated Mixture-of-Experts (MoE) design with 355 billion total parameters, specifically engineered to optimize parameter efficiency by activating only 32 billion parameters during a forward pass. A defining feature of the model is its dual-mode execution framework, which allows it to alternate between a high-latency 'Thinking Mode' for multi-step planning and an instantaneous 'Non-Thinking Mode' for standard interactive tasks.
Technical innovations in GLM-4.5 focus on architectural depth over width to enhance logical deduction and mathematical processing. The model utilizes Grouped-Query Attention (GQA) with 96 attention heads and a hidden dimension size of 5120. Its MoE implementation features sigmoid-gated routing and QK-Norm to ensure stable expert utilization and load balancing. The training pipeline involved a massive 23-trillion-token corpus, including 7 trillion tokens dedicated to code and reasoning datasets, followed by reinforcement learning using the custom-built 'slime' infrastructure to refine autonomous decision-making.
Designed for production-grade agent applications, GLM-4.5 supports native function calling and complex web browsing with a high success rate. It features an expansive 128,000-token context window and a substantial maximum output limit of 96,000 tokens, making it suitable for long-form document analysis and full-stack software development. The model is released with open weights under the MIT License, facilitating broad adoption in both research and commercial environments.
General Language Models from Z.ai
Rank
#66
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 1410 | 30 |
Graduate-Level QA GPQA | 0.791 | 30 |
Professional Knowledge MMLU Pro | 0.81 | 32 |
General Text Text Arena | 1411 | 48 |
Overall Rank
#66
Coding Rank
#50
Total Score
B
69 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online