VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/1.1-what-is-areal

⇱ What is AReaL | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

What is AReaL

Purpose and Scope

This document introduces AReaL (Asynchronous Reinforcement Learning System), explaining its purpose, target use cases, core capabilities, and architectural components. AReaL is designed to provide a high-throughput, stable, and flexible infrastructure for aligning Large Language Models (LLMs) through reinforcement learning, with a specific focus on reasoning and agentic capabilities.

What is AReaL?

AReaL is an open-source fully asynchronous reinforcement learning training system developed by researchers from Tsinghua IIIS and the AReaL Team at Ant Group README.md15-17 It is built to scale RL training for large reasoning models (LRMs) and AI agents by decoupling the generation of experience (rollouts) from the optimization of the model (training) blog/AReaL_v0_3.md9-11

Sources: README.md15-23 blog/AReaL_v0_3.md9-13

Technical Foundation

AReaL leverages a modern high-performance stack to ensure scalability and efficiency:

ComponentTechnologyReference
LanguagePython 3.12+ (managed via uv)AGENTS.md7 CLAUDE.md8
Deep LearningPyTorch Native (FSDP2, Archon)CLAUDE.md8 AGENTS.md7
Training BackendsFSDP, Megatron-LM, ArchonCLAUDE.md14-15 AGENTS.md64-65
Inference BackendsSGLang (Default), vLLMblog/AReaL_v0_2.md60-63 CLAUDE.md14
InfrastructureRay, SLURM, Local, SkyPilotCLAUDE.md18-19 AGENTS.md66
HardwareNVIDIA CUDA, Ascend NPUREADME.md82-87 CLAUDE.md42

Sources: README.md82-87 blog/AReaL_v0_2.md60-63 AGENTS.md7-66 CLAUDE.md8-19

Target Use Cases and Users

AReaL is optimized for scenarios where standard synchronous RL (where the GPU idles during generation or training) becomes a bottleneck blog/AReaL_v0_3.md49-51

  1. Reasoning Model Alignment: Training models to solve complex math, coding, and logic problems using algorithms like GRPO (Group Relative Policy Optimization) or PPO blog/AReaL_v0_1.md68-71 blog/AReaL_v0_2.md125-130
  2. Agentic RL: Fine-tuning models to use tools and navigate multi-turn environments. AReaL supports OpenClaw and ZeroClaw patterns for external agent runtimes README.md31-34 README.md61-63
  3. Large-Scale Training: Scaling to 1,000+ GPUs using high-performance data transfer (NCCL with GDRDMA) over InfiniBand/RoCE blog/AReaL_v0_2.md77-83
  4. Self-Evolving Data Synthesis: Integrated with engines like AReaL-SEA for synthetic data generation and RL refinement README.md68-74
  5. Customer Service Agents: Specialized support for training agents on benchmarks like $\tau^2$-Bench using multi-turn workflows README.md68-74
  6. Hardware Diversity: Native support for Ascend NPU devices via the ascend branch README.md82-87

Sources: README.md31-87 blog/AReaL_v0_2.md77-83 blog/AReaL_v0_1.md68-71 blog/AReaL_v0_3.md49-51

Performance Highlights

  • Throughput: v0.3 ("boba²") achieves a 2.77× speedup over synchronous systems by decoupling generation and training clusters blog/AReaL_v0_3.md9-11
  • State-of-the-Art Results: Used to train SOTA 7B models for mathematical reasoning (61.9 on AIME 2024) and competitive 32B models with minimal data blog/AReaL_v0_2.md11-31
  • Variable-Length Optimization: Eliminates padding by packing sequences into 1D tensors, maximizing GPU memory utilization blog/AReaL_v0_2.md69-75

Sources: blog/AReaL_v0_3.md9-11 blog/AReaL_v0_2.md11-31 blog/AReaL_v0_2.md69-75

Core Capabilities

1. Fully Asynchronous Execution

The system decouples the InferenceEngine (Rollout) from the TrainEngine (Optimization). The Rollout Controller manages the flow between generation, reward assignment, and the replay buffer blog/AReaL_v0_3.md110-117

Asynchronous Training Pipeline Architecture


Sources: blog/AReaL_v0_3.md85-117 CLAUDE.md12-24 AGENTS.md61-72

2. Multi-Backend Flexibility

AReaL provides modular engine abstractions configured via AllocationType or explicit backend fields areal/api/alloc_mode.py16-21

InterfaceImplementationsReference
TrainingFSDPEngine, MegatronEngine, ArchonEngineAGENTS.md87 CLAUDE.md14-15
InferenceSGLangBackend, VLLMBackendCLAUDE.md14 AGENTS.md7
AlgorithmGRPO, PPO, SFT, DPO, DAPOAGENTS.md86 CLAUDE.md131-132

Sources: areal/api/alloc_mode.py16-21 AGENTS.md7-87 CLAUDE.md14-132

System Architecture Overview

Component Relationship Map

This diagram associates high-level system concepts with the configuration and engine entities defined in the codebase.

Code Entity Mapping: Natural Language to Code Space


Sources: AGENTS.md61-70 CLAUDE.md12-19 areal/api/alloc_mode.py32-60

Data Flow: Sequence Packing

To handle variable sequence lengths efficiently, AReaL uses a dynamic allocation algorithm to pack sequences into 1D tensors without padding blog/AReaL_v0_2.md71-75

Data Flow: Token Processing


Sources: blog/AReaL_v0_2.md69-75 blog/AReaL_v0_2.md132-137 AGENTS.md71 CLAUDE.md23

Key Differentiators

Staleness and Versioning

In asynchronous RL, trajectories may be composed of segments produced by different model versions. AReaL manages this through:

  • WeightUpdateMeta: Metadata for weight synchronization across clusters.
  • Interruptible Rollout Workers: Rollout workers can interrupt ongoing generations to load new parameters via update_weights, discarding old KV caches blog/AReaL_v0_3.md91-96

Sources: blog/AReaL_v0_3.md91-99

Advanced Parallelism

AReaL supports complex parallelism strategies beyond simple Data Parallelism (DP) via the ParallelStrategy class areal/api/alloc_mode.py33-60:

Sources: areal/api/alloc_mode.py33-160


Sources: