Last indexed: 7 May 2026 (2e12c1)

Asynchronous Training

This page describes AReaL's asynchronous training system, which decouples rollout generation from policy updates to achieve significant training speedups. For general RL algorithm details, see Algorithm Overview For information about rollout workflows, see Workflow and Rollout System

Purpose and Scope

Asynchronous training in AReaL enables concurrent execution of two traditionally sequential phases: rollout generation (inference on the policy network) and gradient-based training (policy updates). This architectural pattern achieves high training throughput while maintaining training stability through version tracking and staleness control.

This document covers:

The asynchronous execution architecture and its components.
Version tracking mechanisms for weight synchronization.
Offpolicyness management and staleness thresholds.
Weight update protocols (disk-based and distributed XCCL).
Coordination between PPOTrainer, RolloutController, and InferenceEngine.

Sources: areal/trainer/rl_trainer.py102-185 areal/trainer/rl_trainer.py200-250

Core Concept

Traditional RL training follows a synchronous pattern: Generate Rollouts → Wait → Train on Rollouts → Update Weights → Repeat

AReaL's asynchronous design (referred to as boba²) allows overlapping execution: Thread 1: Generate Rollouts → Generate More Rollouts → ... Thread 2: Train on Queue → Train on Queue → ...

Key Innovation: The rollout generation process proceeds independently of the training process. This is particularly beneficial when inference and training use different hardware configurations or when generation latency is high (e.g., multi-turn agentic workflows). The trainer uses versioning to tag the current state of the model after each optimization step. This architecture delivers up to 2.77× speedup compared to synchronous systems.

Sources: areal/trainer/rl_trainer.py102-152 areal/trainer/rl_trainer.py192-205 blog/AReaL_v0_2.md54-57

Architecture Overview

The following diagram maps the natural language concepts of the asynchronous loop to the specific code entities in the AReaL repository.

System Entity Map: Rollout to Training Flow

Sources: areal/trainer/rl_trainer.py105-185 areal/trainer/rl_trainer.py200-240 areal/api/engine_api.py14-23 areal/trainer/rl_trainer.py37-42

Version Tracking System

Every weight update is assigned a monotonically increasing version number. This enables precise tracking of which policy version generated each rollout.

Implementation Details

Training Engine Side: The TrainEngine (e.g., FSDPEngine) maintains a version counter that increments after each optimization step. This ensures that every gradient update results in a unique identifier for the model's state.

Weight Update Metadata: The WeightUpdateMeta dataclass carries version information during weight synchronization.

type: Either "disk" or "xccl" (distributed communication) areal/api/engine_api.py21
version: The new version number after the update areal/api/engine_api.py21
lora_meta: Optional metadata for LoRA-based updates areal/api/engine_api.py22

Inference Engine Side: The inference backends (like RemoteSGLangEngine or RemotevLLMEngine areal/trainer/rl_trainer.py37) receive updates and track the current version to tag generated trajectories. This ensures the trainer knows exactly how "stale" a rollout is relative to the current parameters.

Sources: areal/api/engine_api.py14-23 areal/trainer/rl_trainer.py37-42 areal/trainer/rl_trainer.py129-144

Offpolicyness Management

Offpolicyness (staleness) measures how many weight updates have occurred since a rollout was generated. This is managed by comparing the version tag of the rollout with the current trainer version.

Staleness Check Logic

The system manages this through configuration parameters in the PPOConfig areal/trainer/rl_trainer.py28 or GRPOConfig. If the version gap exceeds the threshold, the data is considered too off-policy for stable gradient updates and is discarded. This is critical for algorithms like PPO that rely on importance sampling which degrades as the policy diverges.

Sources: areal/trainer/rl_trainer.py114-146 areal/api/cli_args.py28-36 CONTRIBUTING.md71

Weight Synchronization Mechanisms

AReaL supports two primary weight update protocols to sync the TrainEngine with the InferenceEngine.

1. Disk-Based Synchronization

The TrainEngine saves a checkpoint to a shared filesystem using the Saver class areal/trainer/rl_trainer.py59 The InferenceEngine then triggers a load from that path. This is simple but relies on high-performance shared storage (NFS/Lustre). The Saver also handles async staging to avoid blocking the training loop areal/trainer/sft_trainer.py181

2. Distributed Synchronization (XCCL)

XCCL (Cross-Collective Communication) enables direct GPU-to-GPU weight transfers without filesystem intermediaries. This is highly efficient for multi-node clusters and reduces latency significantly. AReaL leverages NCCL with GPU-Direct RDMA (GDRDMA) to keep transfer overhead below 3 seconds in 1,000-GPU clusters blog/AReaL_v0_2.md77-83

Implementation: The trainer initiates these updates by passing WeightUpdateMeta to the rollout system. The PPOTrainer manages the lifecycle and allocation of these engines areal/trainer/rl_trainer.py132-144

Sources: areal/trainer/rl_trainer.py132-144 areal/api/engine_api.py14-23 areal/trainer/rl_trainer.py59-60 blog/AReaL_v0_2.md77-83

Data Transfer and RTensors

For large-scale asynchronous training, AReaL uses RTensor (Remote Tensor) to manage data movement between distributed components.

RTensor Communication Map

The HttpRTensorBackend handles the storage and concurrent fetching of tensor shards via HTTP areal/infra/rpc/rtensor.py88-155 This allows the InferenceEngine to offload generated trajectories (logits, rewards, etc.) to a remote buffer, which the Trainer then fetches on-demand during the optimization step.

Sources: areal/infra/rpc/rtensor.py25-155 tests/test_rtensor.py72-105 areal/infra/controller/train_controller.py23

Performance and Monitoring

The system uses perf_tracer and stats_tracker to monitor the overhead of asynchronous components areal/trainer/rl_trainer.py52

Timing Categories: Category.IO for weight updates and Category.COMPUTE for training steps areal/trainer/sft_trainer.py173-190
Online Mode: AReaL supports an online_mode where rollouts are generated on-the-fly via OpenAI-compatible APIs areal/trainer/rl_trainer.py154 In this mode, a special _EmptyDataLoader is used to drive the training loop areal/trainer/rl_trainer.py78-104
Dataloader Management: StatefulDataLoader is used to ensure that data state is preserved across restarts areal/trainer/rl_trainer.py12

Component	Metric	Code Entity
Rollout Timing	`arun_episode`	`perf_tracer` areal/trainer/rl_trainer.py52
Trainer Sync	`WeightUpdateMeta`	`WeightUpdateMeta` areal/api/engine_api.py14-23
Engine RPC	`call_maybe_async`	`call_maybe_async` areal/infra/utils/concurrent.py51
Tensor Storage	`shard_id`	`TensorShardInfo` areal/infra/rpc/rtensor.py69-82

Sources: areal/trainer/rl_trainer.py52-60 areal/trainer/rl_trainer.py78-104 areal/api/engine_api.py14-23 areal/infra/utils/concurrent.py51 areal/trainer/sft_trainer.py173-191 areal/infra/rpc/rtensor.py69-82

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/7.5-asynchronous-training