Last indexed: 7 May 2026 (2e12c1)

Rollout Coordination

Overview

The RolloutController manages distributed rollout workers and coordinates trajectory generation for reinforcement learning training. It serves as the orchestration layer between trainers (like PPOTrainer areal/trainer/rl_trainer.py95) and the distributed inference infrastructure, handling:

Worker Management: Creates and manages inference workers via the scheduler areal/infra/controller/rollout_controller.py87-88
Task Dispatching: Distributes rollout tasks across workers using round-robin scheduling areal/infra/controller/rollout_controller.py91-92
Asynchronous Execution: Submits tasks and collects results via BatchTaskDispatcher areal/infra/controller/rollout_controller.py104-107
Callback Server: Runs a Flask-based HTTP server to coordinate weight updates and task completion areal/infra/controller/rollout_controller.py109-116
Proxy Workers: Manages proxy workers for agent-based workflows areal/infra/controller/rollout_controller.py123-126
Capacity Control: Throttles concurrent rollouts via StalenessManager areal/infra/controller/rollout_controller.py102

The controller coordinates RemoteInfEngine instances on each worker areal/infra/remote_inf_engine.py27-32 which connect to inference backends like SGLang or vLLM using the RemoteInfBackendProtocol areal/infra/remote_inf_engine.py125-137 For multi-node training, the DistRolloutCoordinator areal/infra/dist_rollout.py97-100 provides additional logic for redistributing trajectories across data-parallel ranks to ensure load balancing.

Sources: areal/infra/controller/rollout_controller.py72-133 areal/infra/remote_inf_engine.py125-137 areal/infra/dist_rollout.py97-100

System Architecture

The RolloutController coordinates distributed rollout generation by managing inference workers and dispatching tasks across them.

Rollout Coordination Logic to Code Entities

Sources: areal/infra/controller/rollout_controller.py72-224 areal/infra/remote_inf_engine.py60-230

Distributed Load Balancing

When generating rollouts in a distributed environment, the DistRolloutCoordinator handles the redistribution of trajectories to prevent worker idling. This is particularly important when sequence lengths vary significantly, as it ensures each data-parallel rank processes a similar number of tokens.

Trajectory Redistribution Workflow

Gather: Trajectories are gathered from all data-parallel heads using all_gather_tensor_container areal/infra/dist_rollout.py60
Unpad: Pad positions are removed from each trajectory to compute accurate sequence lengths areal/infra/dist_rollout.py72-77
Packing: The system computes sequence lengths and uses a packing algorithm (FFD or KK) to partition trajectories areal/infra/dist_rollout.py68-82
Redistribution: The redistribute_trajectories function assigns trajectories to ranks based on token counts to balance the compute load areal/infra/dist_rollout.py29-58
Broadcast: The resulting batch is broadcast to context and model parallel groups using broadcast_tensor_container to ensure all workers have the necessary data for the training step areal/infra/dist_rollout.py141-145

Sources: areal/infra/dist_rollout.py29-150 areal/utils/seqpack.py167-188

Packing Algorithms

The choice of packing algorithm in redistribute_trajectories areal/infra/dist_rollout.py32 impacts the load balance quality. These are implemented in areal/utils/seqpack.py:

Algorithm	Key	Description	Balance Quality
First Fit Decreasing (FFD)	`ffd`	Greedy bin-packing heuristic. Sorts sequences by length and assigns to the first bin with remaining capacity areal/utils/seqpack.py196-203	Good
Karmarkar-Karp (KK)	`kk`	Largest Differencing Method. Iteratively merges imbalanced partial partitions using a max-heap areal/utils/seqpack.py163	Excellent

KK is recommended for large-scale RL training where sequence lengths vary significantly, such as open-ended generation in PPO docs/en/algorithms/sequence_packing.md53-61 It significantly reduces the "spread" (difference between the most-loaded and least-loaded rank) docs/en/algorithms/sequence_packing.md56-57

Sources: areal/utils/seqpack.py157-188 docs/en/algorithms/sequence_packing.md1-68

Core Coordination Methods

prepare_batch()

Used by trainers to collect a batch of trajectories asynchronously. It draws samples from a StatefulDataLoader and submits them to the RolloutController dispatcher.

rollout_batch()

A method for generating rollouts with distributed coordination. In DistRolloutCoordinator, it ensures only data-parallel heads generate rollouts before redistributing them to all ranks areal/infra/dist_rollout.py152-190

_broadcast_and_redistribute_trajectories()

A helper in DistRolloutCoordinator that encapsulates the synchronization barriers and communication required to move trajectories from inference heads to training workers areal/infra/dist_rollout.py102-150 It handles redistribution within the data parallel group first, then broadcasts to context and model parallel groups areal/infra/dist_rollout.py109-145

Sources: areal/infra/dist_rollout.py102-190 areal/infra/controller/rollout_controller.py15-18

Callback Server and Weight Synchronization

The RolloutController runs a Flask-based callback server to handle control signals and worker feedback.

Weight Update Callback Routes

Route	Description
`/callback/init_weights_group`	Initializes NCCL/XCCL groups on rollout workers areal/infra/remote_inf_engine.py215-218
`/callback/update_weights_xccl`	Triggers distributed weight broadcast areal/infra/remote_inf_engine.py194-199
`/callback/update_weights_disk`	Commands workers to load weights from a shared path areal/infra/remote_inf_engine.py177-180
`/callback/rollout_complete`	Signals that a trajectory is ready for collection areal/infra/controller/rollout_controller.py119-120

The server is started in a separate daemon thread to avoid blocking the main controller logic areal/infra/controller/rollout_controller.py112

Sources: areal/infra/controller/rollout_controller.py109-117 areal/infra/remote_inf_engine.py177-218

Proxy Workers and Agentic RL

For workflows that involve external agents or OpenAI-compatible API calls, the RolloutController manages "Proxy Workers" and a ProxyRolloutServer areal/experimental/openai/proxy/proxy_rollout_server.py63

Colocation: Proxy workers are typically forked and colocated with rollout workers to minimize latency areal/infra/controller/rollout_controller.py123-126
Gateway: An optional proxy gateway server routes external requests to the distributed proxy workers areal/infra/controller/rollout_controller.py128-133
OpenAI Compatibility: The ProxyRolloutServer provides /chat/completions and /rl/start_session endpoints to support agentic interactions areal/experimental/openai/proxy/proxy_rollout_server.py40-57

Sources: areal/infra/controller/rollout_controller.py123-133 areal/experimental/openai/proxy/proxy_rollout_server.py40-123

Staleness and Versioning

The StalenessManager tracks the version of the model on the controller and compares it with the version reported by trajectories areal/infra/controller/rollout_controller.py95-96

Version Tracking: The controller maintains a _version counter that increments on weight updates areal/infra/controller/rollout_controller.py96
Capacity: The manager controls global capacity to prevent inference engine saturation areal/infra/controller/rollout_controller.py206-209

Sources: areal/infra/controller/rollout_controller.py95-102

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/5.3-rollout-coordination