VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/1-areal-overview

⇱ inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

AReaL Overview

Purpose: This document provides a high-level introduction to the AReaL system, its architectural philosophy, and how its major components interact. For detailed information about specific subsystems, refer to the child pages: installation procedures (Setup and Installation), basic usage (Quick Start Guide), detailed component interactions (Architecture Overview), and core innovations (Key Innovations).

Scope: This overview covers the system's purpose, core components, technology stack, and design principles. For algorithm-specific details, see Algorithm Overview. For configuration details, see Configuration System.

What is AReaL

AReaL (Asynchronous Reinforcement Learning) is a distributed training framework for large language model alignment using reinforcement learning. Developed by researchers and engineers from Tsinghua IIIS and the AReaL Team at Ant Group, the system enables a fully asynchronous RL training paradigm, achieving over 2.77× speedup compared to synchronous approaches while maintaining training stability. It is specifically optimized for efficiency and scalability, making it particularly well-suited for training large-scale reasoning and agentic models. README.md1-24 README.md102-107 CLAUDE.md5-9 CLAUDE.md30-32

Key Design Highlights:

GoalImplementation
Asynchronous ExecutionFully decoupled generation and training clusters with interruptible rollout workers. README.md102-107 blog/AReaL_v0_3.md79-88
Multi-Backend FlexibilityPluggable training backends (FSDP2, Megatron, Archon) and inference backends (SGLang, vLLM). AGENTS.md7 CLAUDE.md8-9
ScalabilitySupport for multi-dimensional parallelism (DP, TP, CP, PP, EP) and GDRDMA-optimized data transfer for 1K-GPU scaling. blog/AReaL_v0_2.md77-83
Agentic RL IntegrationNative support for multi-turn conversations, tool-integrated reasoning, and external agent runtimes like OpenClaw. README.md43-46 README.md61-63
Cutting-Edge PerformanceState-of-the-art results in math, coding, search, and customer service agents. README.md37-39 blog/AReaL_v0_3.md17-30

Sources: README.md1-40 README.md102-107 CLAUDE.md30-32 AGENTS.md7-8 blog/AReaL_v0_2.md77-83 blog/AReaL_v0_3.md79-88

Core Architecture Components

AReaL's architecture consists of five primary component types, each with well-defined responsibilities:

AReaL System Component Map


Component Responsibilities:

ComponentKey ClassesPrimary Role
TrainerPPOTrainer, SFTTrainer, GRPOTrainer, DPOTrainerOrchestrates training loop, manages dataset loading, coordinates rollout and training phases. AGENTS.md148-150 CLAUDE.md70
TrainEngineFSDPEngine, MegatronEngine, ArchonEngineExecutes forward/backward passes, manages optimizer, handles distributed training. AGENTS.md85-87
InferenceEngineRemoteSGLangEngine, RemotevLLMEngineGenerates rollouts asynchronously, receives weight updates, manages inference servers. CLAUDE.md14
RolloutWorkflowRLVRWorkflow, MultiTurnWorkflow, AgentWorkflowDefines episode generation logic via arun_episode, computes rewards, produces training tensors. AGENTS.md88 AGENTS.md105
SchedulerLocalScheduler, RayScheduler, SlurmSchedulerAllocates workers, creates engine instances, manages RPC calls. AGENTS.md65 CLAUDE.md18

Sources: README.md15-23 AGENTS.md56-74 CLAUDE.md10-26 AGENTS.md148-150

Asynchronous Training Data Flow

The following diagram illustrates how data flows through the asynchronous training pipeline, showing the separation between rollout generation and model training:

Asynchronous RL Pipeline Flow


Key Data Structures:

  • ModelRequest: Input specification for inference containing input_ids, GenerationHyperparameters, and vision data. CLAUDE.md13
  • ModelResponse: Inference output containing output_tokens, logprobs, and output_versions. CLAUDE.md13
  • WeightUpdateMeta: Weight synchronization metadata specifying update type (disk/nccl/xccl) and version number. AGENTS.md61

Offpolicyness Control: The system maintains training stability by checking version staleness. Rollouts generated with stale policies (older than configured max_head_offpolicyness) are discarded to prevent training on outdated trajectories. README.md102-107 blog/AReaL_v0_3.md105-108

Sources: README.md79-84 AGENTS.md143-148 CLAUDE.md13-17 blog/AReaL_v0_3.md105-108

Multi-Backend Architecture

AReaL supports multiple training and inference backends through abstract interfaces, enabling users to select backends based on their hardware (e.g., CUDA or Ascend NPU) and model requirements:

Engine Abstraction and Synchronization


Backend Selection via Configuration: Backends are configured via the actor.backend and rollout.backend fields. For example, fsdp:d8 or sglang:d1p1t1. The system also supports specialized hardware like Ascend NPU via the ascend branch. README.md82-87 areal/api/alloc_mode.py19-22

Sources: AGENTS.md7 CLAUDE.md8-9 AGENTS.md85-88 README.md82-87 areal/api/alloc_mode.py19-22

Parallelism and Resource Allocation

AReaL provides a flexible ParallelStrategy that supports 5D parallelism, crucial for training large-scale models and Mixture-of-Experts (MoE) architectures.

DimensionDescriptionKey Variable
TensorSplits individual operations across devicestensor_parallel_size areal/api/alloc_mode.py50
PipelineSplits model layers across devicespipeline_parallel_size areal/api/alloc_mode.py51
DataReplicates the model and splits datadata_parallel_size areal/api/alloc_mode.py52
ContextSplits sequence length (attention-specific)context_parallel_size areal/api/alloc_mode.py53
ExpertSplits experts in MoE modelsexpert_parallel_size areal/api/alloc_mode.py54

Sources: areal/api/alloc_mode.py33-60 CLAUDE.md15-17

Supported Capabilities

Algorithms

AReaL supports a wide range of RL and alignment algorithms, often with specific optimizations for reasoning tasks (e.g., removing the critic to save compute).

AlgorithmTypeDescription
GRPOOn-Policy RLGroup Relative Policy Optimization. blog/AReaL_v0_2.md127
PPOOn-Policy RLProximal Policy Optimization. blog/AReaL_v0_1.md68
DAPOOn-Policy RLDirect Alignment Policy Optimization. blog/AReaL_v0_2.md104
DPOOffline RLDirect Preference Optimization. Table of Contents Section 7.8
SFTSupervisedSupervised Fine-Tuning. AGENTS.md148

Sources: blog/AReaL_v0_1.md68-72 blog/AReaL_v0_2.md127-130 AGENTS.md148-150

Key Innovations

  • Interruptible Rollout Worker: Discards KV caches on weight updates to ensure trajectories are fresh, handling segments produced by different model versions. blog/AReaL_v0_3.md91-99
  • Tree Training: Optimizes training by sharing prefix computations across rollouts with the same prompt, supported across FSDP, Megatron, and Archon. examples/tau2/README.md154-156
  • High-Performance Data Transfer: Utilizes NCCL with GPU-Direct RDMA (GDRDMA) to keep transfer overhead under 3 seconds in 1,000-GPU clusters. blog/AReaL_v0_2.md77-83

Getting Started

Installation

AReaL uses uv for high-performance dependency management.


README.md124-131 AGENTS.md11

Running Your First Experiment

Training is typically launched via scripts in the examples/ directory using configuration files.


README.md144-146

Next Steps