VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/5.3-rollout-coordination

⇱ Rollout Coordination | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Rollout Coordination

Overview

The RolloutController manages distributed rollout workers and coordinates trajectory generation for reinforcement learning training. It serves as the orchestration layer between trainers (like PPOTrainer areal/trainer/rl_trainer.py95) and the distributed inference infrastructure, handling:

The controller coordinates RemoteInfEngine instances on each worker areal/infra/remote_inf_engine.py27-32 which connect to inference backends like SGLang or vLLM using the RemoteInfBackendProtocol areal/infra/remote_inf_engine.py125-137 For multi-node training, the DistRolloutCoordinator areal/infra/dist_rollout.py97-100 provides additional logic for redistributing trajectories across data-parallel ranks to ensure load balancing.

Sources: areal/infra/controller/rollout_controller.py72-133 areal/infra/remote_inf_engine.py125-137 areal/infra/dist_rollout.py97-100

System Architecture

The RolloutController coordinates distributed rollout generation by managing inference workers and dispatching tasks across them.

Rollout Coordination Logic to Code Entities


Sources: areal/infra/controller/rollout_controller.py72-224 areal/infra/remote_inf_engine.py60-230

Distributed Load Balancing

When generating rollouts in a distributed environment, the DistRolloutCoordinator handles the redistribution of trajectories to prevent worker idling. This is particularly important when sequence lengths vary significantly, as it ensures each data-parallel rank processes a similar number of tokens.

Trajectory Redistribution Workflow


  1. Gather: Trajectories are gathered from all data-parallel heads using all_gather_tensor_container areal/infra/dist_rollout.py60
  2. Unpad: Pad positions are removed from each trajectory to compute accurate sequence lengths areal/infra/dist_rollout.py72-77
  3. Packing: The system computes sequence lengths and uses a packing algorithm (FFD or KK) to partition trajectories areal/infra/dist_rollout.py68-82
  4. Redistribution: The redistribute_trajectories function assigns trajectories to ranks based on token counts to balance the compute load areal/infra/dist_rollout.py29-58
  5. Broadcast: The resulting batch is broadcast to context and model parallel groups using broadcast_tensor_container to ensure all workers have the necessary data for the training step areal/infra/dist_rollout.py141-145

Sources: areal/infra/dist_rollout.py29-150 areal/utils/seqpack.py167-188

Packing Algorithms

The choice of packing algorithm in redistribute_trajectories areal/infra/dist_rollout.py32 impacts the load balance quality. These are implemented in areal/utils/seqpack.py:

AlgorithmKeyDescriptionBalance Quality
First Fit Decreasing (FFD)ffdGreedy bin-packing heuristic. Sorts sequences by length and assigns to the first bin with remaining capacity areal/utils/seqpack.py196-203Good
Karmarkar-Karp (KK)kkLargest Differencing Method. Iteratively merges imbalanced partial partitions using a max-heap areal/utils/seqpack.py163Excellent

KK is recommended for large-scale RL training where sequence lengths vary significantly, such as open-ended generation in PPO docs/en/algorithms/sequence_packing.md53-61 It significantly reduces the "spread" (difference between the most-loaded and least-loaded rank) docs/en/algorithms/sequence_packing.md56-57

Sources: areal/utils/seqpack.py157-188 docs/en/algorithms/sequence_packing.md1-68

Core Coordination Methods

prepare_batch()

Used by trainers to collect a batch of trajectories asynchronously. It draws samples from a StatefulDataLoader and submits them to the RolloutController dispatcher.

rollout_batch()

A method for generating rollouts with distributed coordination. In DistRolloutCoordinator, it ensures only data-parallel heads generate rollouts before redistributing them to all ranks areal/infra/dist_rollout.py152-190

_broadcast_and_redistribute_trajectories()

A helper in DistRolloutCoordinator that encapsulates the synchronization barriers and communication required to move trajectories from inference heads to training workers areal/infra/dist_rollout.py102-150 It handles redistribution within the data parallel group first, then broadcasts to context and model parallel groups areal/infra/dist_rollout.py109-145

Sources: areal/infra/dist_rollout.py102-190 areal/infra/controller/rollout_controller.py15-18

Callback Server and Weight Synchronization

The RolloutController runs a Flask-based callback server to handle control signals and worker feedback.

Weight Update Callback Routes

RouteDescription
/callback/init_weights_groupInitializes NCCL/XCCL groups on rollout workers areal/infra/remote_inf_engine.py215-218
/callback/update_weights_xcclTriggers distributed weight broadcast areal/infra/remote_inf_engine.py194-199
/callback/update_weights_diskCommands workers to load weights from a shared path areal/infra/remote_inf_engine.py177-180
/callback/rollout_completeSignals that a trajectory is ready for collection areal/infra/controller/rollout_controller.py119-120

The server is started in a separate daemon thread to avoid blocking the main controller logic areal/infra/controller/rollout_controller.py112

Sources: areal/infra/controller/rollout_controller.py109-117 areal/infra/remote_inf_engine.py177-218

Proxy Workers and Agentic RL

For workflows that involve external agents or OpenAI-compatible API calls, the RolloutController manages "Proxy Workers" and a ProxyRolloutServer areal/experimental/openai/proxy/proxy_rollout_server.py63

Sources: areal/infra/controller/rollout_controller.py123-133 areal/experimental/openai/proxy/proxy_rollout_server.py40-123

Staleness and Versioning

The StalenessManager tracks the version of the model on the controller and compares it with the version reported by trajectories areal/infra/controller/rollout_controller.py95-96

Sources: areal/infra/controller/rollout_controller.py95-102