Model Overview
- Model Architecture: GLM-5.2
- Input: Text
- Output: Text
- Supported Hardware Microarchitecture: AMD MI350/MI355
- ROCm: 7.0.0
- PyTorch: 2.9.0
- Transformers: 5.8.1
- Operating System(s): Linux
- Inference Engine: SGLang/vLLM
- Model Optimizer: AMD-Quark (V0.11)
- Weight quantization: MOE-only (shared experts quantized), OCP MXFP4, Static
- Activation quantization: MOE-only, OCP MXFP4, Dynamic
This model was built with GLM-5.2 model by applying AMD-Quark for MXFP4 quantization.
Model Quantization
The model was quantized from zai-org/GLM-5.2 using AMD-Quark. The weights and activations are quantized to MXFP4.
Quantization scripts:
cd Quark/examples/torch/language_modeling/llm_ptq/
python quantize_quark.py \
--model_dir zai-org/GLM-5.2 \
--output_dir GLM-5.2-MXFP4 \
--quant_scheme mxfp4 \
--exclude_layers "*self_attn*" "*mlp.gate" "*lm_head" \
"*mlp.gate_proj" "*mlp.up_proj" "*mlp.down_proj" \
"*layers.78.*" \ # Exclude the MTP layer (layer 78)
--file2file_quantization
Deployment
Use with SGLang/vLLM
This model can be deployed efficiently using the SGLang or vLLM backends.
Evaluation
The model was evaluated on GSM8K benchmarks.
Accuracy
| Benchmark | GLM-5.2 | GLM-5.2-MXFP4(this model) | Recovery |
| GSM8K (flexible-extract) | 0.9409 | 0.9393 | 99.8% |
Reproduction
The GSM8K results were obtained using the lm-evaluation-harness framework, based on the Docker image lmsysorg/sglang:v0.5.13.post1-rocm700-mi35x, with SGLang pre-installed inside the image and lm-eval compiled and installed from source.
lm_eval --model sglang \
--model_args pretrained=amd/GLM-5.2-MXFP4,tp_size=4 \
--tasks gsm8k \
--batch_size auto
The Docker image rocm/vllm-dev:nightly_main_20260616 with vLLM pre-installed can also be used for reproducing using vLLM backend.
License
Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
- Downloads last month
- 6
Model tree for amd/GLM-5.2-MXFP4
Base model
zai-org/GLM-5.2