Last indexed: 7 May 2026 (2e12c1)

AReaL Overview

Purpose: This document provides a high-level introduction to the AReaL system, its architectural philosophy, and how its major components interact. For detailed information about specific subsystems, refer to the child pages: installation procedures (Setup and Installation), basic usage (Quick Start Guide), detailed component interactions (Architecture Overview), and core innovations (Key Innovations).

Scope: This overview covers the system's purpose, core components, technology stack, and design principles. For algorithm-specific details, see Algorithm Overview. For configuration details, see Configuration System.

What is AReaL

AReaL (Asynchronous Reinforcement Learning) is a distributed training framework for large language model alignment using reinforcement learning. Developed by researchers and engineers from Tsinghua IIIS and the AReaL Team at Ant Group, the system enables a fully asynchronous RL training paradigm, achieving over 2.77× speedup compared to synchronous approaches while maintaining training stability. It is specifically optimized for efficiency and scalability, making it particularly well-suited for training large-scale reasoning and agentic models. README.md1-24 README.md102-107 CLAUDE.md5-9 CLAUDE.md30-32

Key Design Highlights:

Goal	Implementation
Asynchronous Execution	Fully decoupled generation and training clusters with interruptible rollout workers. README.md102-107 blog/AReaL_v0_3.md79-88
Multi-Backend Flexibility	Pluggable training backends (FSDP2, Megatron, Archon) and inference backends (SGLang, vLLM). AGENTS.md7 CLAUDE.md8-9
Scalability	Support for multi-dimensional parallelism (DP, TP, CP, PP, EP) and GDRDMA-optimized data transfer for 1K-GPU scaling. blog/AReaL_v0_2.md77-83
Agentic RL Integration	Native support for multi-turn conversations, tool-integrated reasoning, and external agent runtimes like OpenClaw. README.md43-46 README.md61-63
Cutting-Edge Performance	State-of-the-art results in math, coding, search, and customer service agents. README.md37-39 blog/AReaL_v0_3.md17-30

Sources: README.md1-40 README.md102-107 CLAUDE.md30-32 AGENTS.md7-8 blog/AReaL_v0_2.md77-83 blog/AReaL_v0_3.md79-88

Core Architecture Components

AReaL's architecture consists of five primary component types, each with well-defined responsibilities:

AReaL System Component Map

Component Responsibilities:

Component	Key Classes	Primary Role
Trainer	`PPOTrainer`, `SFTTrainer`, `GRPOTrainer`, `DPOTrainer`	Orchestrates training loop, manages dataset loading, coordinates rollout and training phases. AGENTS.md148-150 CLAUDE.md70
TrainEngine	`FSDPEngine`, `MegatronEngine`, `ArchonEngine`	Executes forward/backward passes, manages optimizer, handles distributed training. AGENTS.md85-87
InferenceEngine	`RemoteSGLangEngine`, `RemotevLLMEngine`	Generates rollouts asynchronously, receives weight updates, manages inference servers. CLAUDE.md14
RolloutWorkflow	`RLVRWorkflow`, `MultiTurnWorkflow`, `AgentWorkflow`	Defines episode generation logic via `arun_episode`, computes rewards, produces training tensors. AGENTS.md88 AGENTS.md105
Scheduler	`LocalScheduler`, `RayScheduler`, `SlurmScheduler`	Allocates workers, creates engine instances, manages RPC calls. AGENTS.md65 CLAUDE.md18

Sources: README.md15-23 AGENTS.md56-74 CLAUDE.md10-26 AGENTS.md148-150

Asynchronous Training Data Flow

The following diagram illustrates how data flows through the asynchronous training pipeline, showing the separation between rollout generation and model training:

Asynchronous RL Pipeline Flow

Key Data Structures:

ModelRequest: Input specification for inference containing input_ids, GenerationHyperparameters, and vision data. CLAUDE.md13
ModelResponse: Inference output containing output_tokens, logprobs, and output_versions. CLAUDE.md13
WeightUpdateMeta: Weight synchronization metadata specifying update type (disk/nccl/xccl) and version number. AGENTS.md61

Offpolicyness Control: The system maintains training stability by checking version staleness. Rollouts generated with stale policies (older than configured max_head_offpolicyness) are discarded to prevent training on outdated trajectories. README.md102-107 blog/AReaL_v0_3.md105-108

Sources: README.md79-84 AGENTS.md143-148 CLAUDE.md13-17 blog/AReaL_v0_3.md105-108

Multi-Backend Architecture

AReaL supports multiple training and inference backends through abstract interfaces, enabling users to select backends based on their hardware (e.g., CUDA or Ascend NPU) and model requirements:

Engine Abstraction and Synchronization

Backend Selection via Configuration: Backends are configured via the actor.backend and rollout.backend fields. For example, fsdp:d8 or sglang:d1p1t1. The system also supports specialized hardware like Ascend NPU via the ascend branch. README.md82-87 areal/api/alloc_mode.py19-22

Sources: AGENTS.md7 CLAUDE.md8-9 AGENTS.md85-88 README.md82-87 areal/api/alloc_mode.py19-22

Parallelism and Resource Allocation

AReaL provides a flexible ParallelStrategy that supports 5D parallelism, crucial for training large-scale models and Mixture-of-Experts (MoE) architectures.

Dimension	Description	Key Variable
Tensor	Splits individual operations across devices	`tensor_parallel_size` areal/api/alloc_mode.py50
Pipeline	Splits model layers across devices	`pipeline_parallel_size` areal/api/alloc_mode.py51
Data	Replicates the model and splits data	`data_parallel_size` areal/api/alloc_mode.py52
Context	Splits sequence length (attention-specific)	`context_parallel_size` areal/api/alloc_mode.py53
Expert	Splits experts in MoE models	`expert_parallel_size` areal/api/alloc_mode.py54

Sources: areal/api/alloc_mode.py33-60 CLAUDE.md15-17

Supported Capabilities

Algorithms

AReaL supports a wide range of RL and alignment algorithms, often with specific optimizations for reasoning tasks (e.g., removing the critic to save compute).

Algorithm	Type	Description
GRPO	On-Policy RL	Group Relative Policy Optimization. blog/AReaL_v0_2.md127
PPO	On-Policy RL	Proximal Policy Optimization. blog/AReaL_v0_1.md68
DAPO	On-Policy RL	Direct Alignment Policy Optimization. blog/AReaL_v0_2.md104
DPO	Offline RL	Direct Preference Optimization. Table of Contents Section 7.8
SFT	Supervised	Supervised Fine-Tuning. AGENTS.md148

Sources: blog/AReaL_v0_1.md68-72 blog/AReaL_v0_2.md127-130 AGENTS.md148-150

Key Innovations

Interruptible Rollout Worker: Discards KV caches on weight updates to ensure trajectories are fresh, handling segments produced by different model versions. blog/AReaL_v0_3.md91-99
Tree Training: Optimizes training by sharing prefix computations across rollouts with the same prompt, supported across FSDP, Megatron, and Archon. examples/tau2/README.md154-156
High-Performance Data Transfer: Utilizes NCCL with GPU-Direct RDMA (GDRDMA) to keep transfer overhead under 3 seconds in 1,000-GPU clusters. blog/AReaL_v0_2.md77-83

Getting Started

Installation

AReaL uses uv for high-performance dependency management.

README.md124-131 AGENTS.md11

Running Your First Experiment

Training is typically launched via scripts in the examples/ directory using configuration files.

README.md144-146

Next Steps

What is AReaL: Detailed comparison and target use cases.
Setup and Installation: Complete installation guide including Docker and NPU support.
Architecture Overview: Deep dive into component interactions and the scheduler system.
Key Innovations: Detailed look at asynchronous training and weight versioning.
Algorithm Overview: Technical details of PPO, GRPO, and DPO implementations.

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/1-areal-overview

⇱ inclusionAI/AReaL | DeepWiki