Last indexed: 7 May 2026 (2e12c1)

Workflow and Rollout System

Purpose and Scope

This document describes the workflow and rollout system in AReaL, which defines how individual training episodes are generated and evaluated. A workflow encapsulates the logic for generating model responses, computing rewards, and assembling trajectories for training. This page covers the workflow interface, built-in implementations, episode execution flow, and integration with the broader system.

For information about implementing custom workflows, see Implementing Custom Workflows. For details on how the DistRolloutCoordinator manages distributed rollout execution, see Rollout Coordination. For specific workflow implementations including multi-turn and vision workflows, see Built-in Workflows.

Sources: areal/api/workflow_api.py1-114 areal/api/engine_api.py197-203

Workflow Concept and Interface

What is a Workflow

A workflow in AReaL defines the complete procedure for generating a single training episode, from initial prompt construction through final reward computation. Each workflow implements an arun_episode method that:

Constructs prompts from input data.
Calls the inference engine to generate responses.
Computes rewards based on the generated output.
Assembles the result into a trajectory dictionary.

Workflows are the primary abstraction for defining task-specific behavior in AReaL's RL training pipeline.

RolloutWorkflow Interface

The abstract RolloutWorkflow class defines the core workflow interface:

Component	Type	Description
`engine`	`InferenceEngine`	The inference engine for generating responses areal/api/workflow_api.py29-30
`data`	`dict[str, Any]`	Input data for the episode (e.g., prompt, ground truth) areal/api/workflow_api.py31-32
Return	`dict[str, Any]` \| `None`	Trajectory dictionary with tensors, or `None` to reject areal/api/workflow_api.py34-37

Key behaviors:

Rejection: Returning None implies that this trajectory is rejected and will not be used for training areal/api/workflow_api.py21-23
Asynchronous: All workflows use async to support concurrent episode generation areal/api/workflow_api.py16
Session tracking: Workflows integrate with SessionTracer via decorators for performance monitoring areal/workflow/rlvr.py21-25

Sources: areal/api/workflow_api.py14-39 areal/workflow/rlvr.py138-177

Episode Structure

The trajectory dictionary returned by arun_episode must contain specific tensor fields:

Field	Shape	Type	Description
`input_ids`	`[batch, seq_len]`	`torch.int32`	Full sequence (prompt + completion)
`loss_mask`	`[batch, seq_len]`	`torch.int32`	0 for prompt tokens, 1 for output tokens
`logprobs`	`[batch, seq_len]`	`torch.float32`	Log probabilities (0.0 for prompt tokens)
`versions`	`[batch, seq_len]`	`torch.int32`	Model version used for each token
`rewards`	`[batch]`	`torch.float32`	Scalar reward for the episode
`attention_mask`	`[batch, seq_len]`	`torch.bool`	Attention mask (typically all 1s)

For vision workflows, an additional multi_modal_input field contains image tensors and grid metadata areal/workflow/vision_rlvr.py146-152

Sources: areal/workflow/rlvr.py169-177 areal/workflow/vision_rlvr.py154-162

Built-in Workflow Implementations

Workflow Hierarchy Diagram

Sources: areal/api/workflow_api.py14-103 areal/workflow/rlvr.py48-177 areal/workflow/vision_rlvr.py26-162 areal/workflow/multi_turn.py18-137 examples/tir/tir_workflow.py47-108

RLVRWorkflow (Single-turn)

RLVRWorkflow implements single-turn reinforcement learning with verifiable rewards. It supports optional "thinking" tokens via chat templates areal/workflow/rlvr.py49

Parameter	Type	Description
`reward_fn`	`Callable` \| `str`	Reward function or import path areal/workflow/rlvr.py62
`gconfig`	`GenerationHyperparameters`	Generation configuration areal/workflow/rlvr.py54
`tokenizer`	`PreTrainedTokenizerFast` \| `str`	Tokenizer or model ID areal/workflow/rlvr.py55
`enable_thinking`	`bool`	Enable thinking tokens (default: `False`) areal/workflow/rlvr.py56

Execution flow:

Tokenize prompt using get_input_ids_fn areal/workflow/rlvr.py146-150
Create ModelRequest with generation config areal/workflow/rlvr.py151-156
Call engine.agenerate() via _collect_samples to generate response areal/workflow/rlvr.py130
Decode output and compute reward via reward_fn areal/workflow/rlvr.py132
Assemble trajectory tensors with batch dimension 1 areal/workflow/rlvr.py169-177

Sources: areal/workflow/rlvr.py48-177

VisionRLVRWorkflow (Multi-modal)

VisionRLVRWorkflow extends RLVRWorkflow for vision-language models. It processes image inputs using a processor and constructs multi-modal requests areal/workflow/vision_rlvr.py26

Parameter	Type	Description
`processor`	`AutoProcessor` \| `str`	Vision processor for image encoding areal/workflow/vision_rlvr.py31

Key differences from RLVRWorkflow:

Uses processor to encode images into pixel_values and image_grid_thw tensors areal/workflow/vision_rlvr.py111-117
Adds multi_modal_input field containing vision tensors to trajectory dict areal/workflow/vision_rlvr.py154-158
Converts images to base64 for transport to inference engine areal/workflow/vision_rlvr.py121

Sources: areal/workflow/vision_rlvr.py26-162

MultiTurnWorkflow (Retry)

MultiTurnWorkflow implements multi-turn dialog with automatic retry until a correct answer is produced or max turns is reached areal/workflow/multi_turn.py18

Parameter	Type	Description
`max_turns`	`int`	Maximum retry attempts areal/workflow/multi_turn.py26
`turn_discount`	`float`	Discount factor per turn (0, 1] areal/workflow/multi_turn.py27

Execution flow:

Generate initial response areal/workflow/multi_turn.py86
Compute reward (typically 0 for incorrect, 1 for correct) areal/workflow/multi_turn.py90-96
If incorrect and turns remain:
- Append retry prompt to conversation areal/workflow/multi_turn.py119
- Retry with extended context areal/workflow/multi_turn.py78
- Apply discount: reward *= turn_discount areal/workflow/multi_turn.py122
Concatenate all turns into single trajectory with unified loss_mask areal/workflow/multi_turn.py105-108

Sources: areal/workflow/multi_turn.py18-137

Specialized Agentic Workflows

AReaL supports complex agentic behaviors through specialized workflows:

TIRWorkflow (Tool-Integrated Reasoning): Manages multi-turn tool calling (e.g., Python execution). It uses a ToolManager to parse tool calls from model output and feed results back as new turns examples/tir/tir_workflow.py47-78
ScaffoldingWorkflow: A modular framework where generation and reward logic are decoupled into Controller and Worker objects. It supports advanced scenarios like LLM-as-a-judge rewards and complex search-agent tool loops examples/scaffolding/workflow.py86-130

Episode Execution Flow

Episode Lifecycle Diagram

Sources: areal/workflow/rlvr.py111-136 areal/api/engine_api.py197-203

Code Entity Space Bridge

The following diagram maps high-level rollout concepts to specific code entities and data structures used during execution.

Sources: areal/api/engine_api.py10-50 areal/utils/hf_utils.py36-47 areal/workflow/rlvr.py100-107 areal/api/workflow_api.py42-70

Session Context and Phase Tracking

Workflows use session context managers and decorators to track execution phases automatically for profiling areal/workflow/rlvr.py82-136:

Session tracking features:

@session_context(): Associates the current async context with a unique session ID.
@trace_session(phase): Automatically logs the start and end of a specific phase (e.g., "reward") areal/workflow/rlvr.py83
atrace_session_phase(phase): Async context manager for manual phase boundary marking areal/workflow/rlvr.py130

Sources: areal/workflow/rlvr.py82-136

Integration with System Components

Trajectory Assembly

Workflows construct trajectory dictionaries by concatenating prompt and completion tokens and aligning metadata areal/workflow/rlvr.py164-177:

Key invariants:

loss_mask: Zeros out prompt tokens so the model is only trained on generated content areal/workflow/rlvr.py167
versions: Tracks which model weight version generated each token, used for off-policy correction areal/workflow/rlvr.py168

Sources: areal/workflow/rlvr.py164-177

WorkflowLike Type Alias

Workflows can be specified in multiple ways via the WorkflowLike type alias areal/api/workflow_api.py110-115:

Sources: areal/api/workflow_api.py110-115

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/5-workflow-and-rollout-system