GLM-4-9B

Parameters

Context Length

128K

Modality

Text

Architecture

Dense

License

MIT License

Release Date

30 Jun 2024

Knowledge Cutoff

Apr 2024

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

FFN Intermediate Size (Dense)

13,696

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

151,552

Architecture Diagram

GLM-4-9B

The GLM-4-9B represents a significant iteration in the General Language Model (GLM) series developed by Zhipu AI and the THUDM Laboratory at Tsinghua University. This 9-billion parameter model is engineered to provide a sophisticated balance between computational efficiency and high-level linguistic performance, supporting a multilingual corpus across 26 languages. It is designed for diverse applications, including high-throughput translation, automated content synthesis, and complex question-answering systems. The model is released with open weights under the MIT License, facilitating broad community adoption and research in the field of large-scale pre-training.

Architecturally, GLM-4-9B is built upon a dense transformer framework that incorporates several structural optimizations. It utilizes Grouped Query Attention (GQA) with 32 attention heads and 2 key-value heads to reduce memory overhead during inference while maintaining robust semantic representation. The model implements an autoregressive blank-infilling objective during its pre-training on 10 trillion tokens, which enhances its ability to handle both prefix-based generation and bidirectional understanding. To support long-context processing, it employs Rotary Position Embeddings (RoPE) and is capable of extending its context window up to 128,000 tokens through YaRN (Yet another RoPE extensioN) scaling techniques.

Technical refinements in the GLM-4-9B architecture include the use of RMSNorm for stable layer normalization and the SiLU (Sigmoid Linear Unit) activation function, often implemented within a SwiGLU-style feed-forward network. The design specifically omits bias terms in most linear layers, except for those within the Query, Key, and Value components, a choice intended to improve the model's length extrapolation capabilities. This model serves as the foundation for specialized variants, such as the GLM-4-9B-Chat for human-aligned dialogue and the GLM-4V-9B for multimodal vision-language tasks, demonstrating its versatility as a base architecture for production-grade AI systems.

About GLM Family

General Language Models from Z.ai

Other GLM Family Models

Evaluation Benchmarks

No evaluation benchmarks for GLM-4-9B available.

Rankings

Overall Rank

Coding Rank

Model Integrity

Total Score

65 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Read the Paper Download Weights Source Code

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/glm-4-9b

⇱

GLM-4-9B

Technical Specifications

Architecture Diagram

GLM-4-9B

About GLM Family

Other GLM Family Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources