![]() |
VOOZH | about |
Parameters
9B
Context Length
128K
Modality
Text
Architecture
Dense
License
MIT License
Release Date
30 Jun 2024
Knowledge Cutoff
Apr 2024
Attention
Attention Structure
Multi-Head Attention
Attention Heads
32
Key-Value Heads
2
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
-
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
40
FFN Intermediate Size (Dense)
13,696
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
151,552
The GLM-4-9B represents a significant iteration in the General Language Model (GLM) series developed by Zhipu AI and the THUDM Laboratory at Tsinghua University. This 9-billion parameter model is engineered to provide a sophisticated balance between computational efficiency and high-level linguistic performance, supporting a multilingual corpus across 26 languages. It is designed for diverse applications, including high-throughput translation, automated content synthesis, and complex question-answering systems. The model is released with open weights under the MIT License, facilitating broad community adoption and research in the field of large-scale pre-training.
Architecturally, GLM-4-9B is built upon a dense transformer framework that incorporates several structural optimizations. It utilizes Grouped Query Attention (GQA) with 32 attention heads and 2 key-value heads to reduce memory overhead during inference while maintaining robust semantic representation. The model implements an autoregressive blank-infilling objective during its pre-training on 10 trillion tokens, which enhances its ability to handle both prefix-based generation and bidirectional understanding. To support long-context processing, it employs Rotary Position Embeddings (RoPE) and is capable of extending its context window up to 128,000 tokens through YaRN (Yet another RoPE extensioN) scaling techniques.
Technical refinements in the GLM-4-9B architecture include the use of RMSNorm for stable layer normalization and the SiLU (Sigmoid Linear Unit) activation function, often implemented within a SwiGLU-style feed-forward network. The design specifically omits bias terms in most linear layers, except for those within the Query, Key, and Value components, a choice intended to improve the model's length extrapolation capabilities. This model serves as the foundation for specialized variants, such as the GLM-4-9B-Chat for human-aligned dialogue and the GLM-4V-9B for multimodal vision-language tasks, demonstrating its versatility as a base architecture for production-grade AI systems.
General Language Models from Z.ai
No evaluation benchmarks for GLM-4-9B available.
Overall Rank
-
Coding Rank
-
Total Score
B
65 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online