Xiaobo Chen#

Xiaobo Chen has a background in large-scale model inference and training optimization, with a focus on high-performance computing and enhancing the efficiency of deep learning systems. He is currently at AMD, working on large model training optimization and the development of high-performance acceleration libraries.

Posts by Xiaobo Chen

👁 Image

June 10, 2026

Dropless MoE Training in JAX with Primus-Turbo

Learn how to train dropless MoE in JAX/MaxText with Primus-Turbo's grouped GEMM and DeepEP all-to-all for faster, more memory-efficient training.

https://rocm.blogs.amd.com/software-tools-optimization/maxtext-dropless-moe/README.html

👁 Image

January 15, 2026

Deep Dive into Primus: High-Performance Training for Large Language Models

Learn how to achieve peak dense LLM training performance on AMD Instinct™ GPUs using Primus’s unified CLI and optimized backend presets.

https://rocm.blogs.amd.com/software-tools-optimization/primus-deep-dive/README.html

👁 Image

December 16, 2025

MoE Training Best Practices on AMD GPUs

Learn how to optimize Mixture-of-Experts (MoE) model training on AMD Instinct GPUs with ROCm. Maximize your AI training performance now!

https://rocm.blogs.amd.com/software-tools-optimization/primus-moe-package/README.html

👁 Image

September 19, 2025

An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

Primus streamlines training on AMD ROCm, from fine-tuning to massive pretraining on MI300X GPUs—faster, safer, and easier to debug

https://rocm.blogs.amd.com/software-tools-optimization/primus-large-models/README.html

👁 Image

August 22, 2025

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html

URL: https://rocm.blogs.amd.com/authors/xiaobo-chen.html

⇱ Xiaobo Chen — ROCm Blogs

Xiaobo Chen

Xiaobo Chen#

Posts by Xiaobo Chen

Dropless MoE Training in JAX with Primus-Turbo

Deep Dive into Primus: High-Performance Training for Large Language Models

MoE Training Best Practices on AMD GPUs

An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs