Last indexed: 7 May 2026 (2e12c1)

What is AReaL

Purpose and Scope

This document introduces AReaL (Asynchronous Reinforcement Learning System), explaining its purpose, target use cases, core capabilities, and architectural components. AReaL is designed to provide a high-throughput, stable, and flexible infrastructure for aligning Large Language Models (LLMs) through reinforcement learning, with a specific focus on reasoning and agentic capabilities.

What is AReaL?

AReaL is an open-source fully asynchronous reinforcement learning training system developed by researchers from Tsinghua IIIS and the AReaL Team at Ant Group README.md15-17 It is built to scale RL training for large reasoning models (LRMs) and AI agents by decoupling the generation of experience (rollouts) from the optimization of the model (training) blog/AReaL_v0_3.md9-11

Sources: README.md15-23 blog/AReaL_v0_3.md9-13

Technical Foundation

AReaL leverages a modern high-performance stack to ensure scalability and efficiency:

Component	Technology	Reference
Language	Python 3.12+ (managed via `uv`)	AGENTS.md7 CLAUDE.md8
Deep Learning	PyTorch Native (FSDP2, Archon)	CLAUDE.md8 AGENTS.md7
Training Backends	FSDP, Megatron-LM, Archon	CLAUDE.md14-15 AGENTS.md64-65
Inference Backends	SGLang (Default), vLLM	blog/AReaL_v0_2.md60-63 CLAUDE.md14
Infrastructure	Ray, SLURM, Local, SkyPilot	CLAUDE.md18-19 AGENTS.md66
Hardware	NVIDIA CUDA, Ascend NPU	README.md82-87 CLAUDE.md42

Sources: README.md82-87 blog/AReaL_v0_2.md60-63 AGENTS.md7-66 CLAUDE.md8-19

Target Use Cases and Users

AReaL is optimized for scenarios where standard synchronous RL (where the GPU idles during generation or training) becomes a bottleneck blog/AReaL_v0_3.md49-51

Reasoning Model Alignment: Training models to solve complex math, coding, and logic problems using algorithms like GRPO (Group Relative Policy Optimization) or PPO blog/AReaL_v0_1.md68-71 blog/AReaL_v0_2.md125-130
Agentic RL: Fine-tuning models to use tools and navigate multi-turn environments. AReaL supports OpenClaw and ZeroClaw patterns for external agent runtimes README.md31-34 README.md61-63
Large-Scale Training: Scaling to 1,000+ GPUs using high-performance data transfer (NCCL with GDRDMA) over InfiniBand/RoCE blog/AReaL_v0_2.md77-83
Self-Evolving Data Synthesis: Integrated with engines like AReaL-SEA for synthetic data generation and RL refinement README.md68-74
Customer Service Agents: Specialized support for training agents on benchmarks like $\tau^2$-Bench using multi-turn workflows README.md68-74
Hardware Diversity: Native support for Ascend NPU devices via the ascend branch README.md82-87

Sources: README.md31-87 blog/AReaL_v0_2.md77-83 blog/AReaL_v0_1.md68-71 blog/AReaL_v0_3.md49-51

Performance Highlights

Throughput: v0.3 ("boba²") achieves a 2.77× speedup over synchronous systems by decoupling generation and training clusters blog/AReaL_v0_3.md9-11
State-of-the-Art Results: Used to train SOTA 7B models for mathematical reasoning (61.9 on AIME 2024) and competitive 32B models with minimal data blog/AReaL_v0_2.md11-31
Variable-Length Optimization: Eliminates padding by packing sequences into 1D tensors, maximizing GPU memory utilization blog/AReaL_v0_2.md69-75

Sources: blog/AReaL_v0_3.md9-11 blog/AReaL_v0_2.md11-31 blog/AReaL_v0_2.md69-75

Core Capabilities

1. Fully Asynchronous Execution

The system decouples the InferenceEngine (Rollout) from the TrainEngine (Optimization). The Rollout Controller manages the flow between generation, reward assignment, and the replay buffer blog/AReaL_v0_3.md110-117

Asynchronous Training Pipeline Architecture

Sources: blog/AReaL_v0_3.md85-117 CLAUDE.md12-24 AGENTS.md61-72

2. Multi-Backend Flexibility

AReaL provides modular engine abstractions configured via AllocationType or explicit backend fields areal/api/alloc_mode.py16-21

Interface	Implementations	Reference
Training	`FSDPEngine`, `MegatronEngine`, `ArchonEngine`	AGENTS.md87 CLAUDE.md14-15
Inference	`SGLangBackend`, `VLLMBackend`	CLAUDE.md14 AGENTS.md7
Algorithm	`GRPO`, `PPO`, `SFT`, `DPO`, `DAPO`	AGENTS.md86 CLAUDE.md131-132

Sources: areal/api/alloc_mode.py16-21 AGENTS.md7-87 CLAUDE.md14-132

System Architecture Overview

Component Relationship Map

This diagram associates high-level system concepts with the configuration and engine entities defined in the codebase.

Code Entity Mapping: Natural Language to Code Space

Sources: AGENTS.md61-70 CLAUDE.md12-19 areal/api/alloc_mode.py32-60

Data Flow: Sequence Packing

To handle variable sequence lengths efficiently, AReaL uses a dynamic allocation algorithm to pack sequences into 1D tensors without padding blog/AReaL_v0_2.md71-75

Data Flow: Token Processing

Sources: blog/AReaL_v0_2.md69-75 blog/AReaL_v0_2.md132-137 AGENTS.md71 CLAUDE.md23

Key Differentiators

Staleness and Versioning

In asynchronous RL, trajectories may be composed of segments produced by different model versions. AReaL manages this through:

WeightUpdateMeta: Metadata for weight synchronization across clusters.
Interruptible Rollout Workers: Rollout workers can interrupt ongoing generations to load new parameters via update_weights, discarding old KV caches blog/AReaL_v0_3.md91-96

Sources: blog/AReaL_v0_3.md91-99

Advanced Parallelism

AReaL supports complex parallelism strategies beyond simple Data Parallelism (DP) via the ParallelStrategy class areal/api/alloc_mode.py33-60:

Tensor Parallelism (TP): Splits individual operations across devices areal/api/alloc_mode.py40
Pipeline Parallelism (PP): Splits model layers across devices in a pipeline fashion areal/api/alloc_mode.py41
Context Parallelism (CP): Splits sequence length across devices, optimized for attention areal/api/alloc_mode.py43
Expert Parallelism (EP): Optimized for MoE models, splitting experts across devices areal/api/alloc_mode.py44

Sources: areal/api/alloc_mode.py33-160

Sources:

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/1.1-what-is-areal

⇱ What is AReaL | inclusionAI/AReaL | DeepWiki