VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/5-workflow-and-rollout-system

⇱ Workflow and Rollout System | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Workflow and Rollout System

Purpose and Scope

This document describes the workflow and rollout system in AReaL, which defines how individual training episodes are generated and evaluated. A workflow encapsulates the logic for generating model responses, computing rewards, and assembling trajectories for training. This page covers the workflow interface, built-in implementations, episode execution flow, and integration with the broader system.

For information about implementing custom workflows, see Implementing Custom Workflows. For details on how the DistRolloutCoordinator manages distributed rollout execution, see Rollout Coordination. For specific workflow implementations including multi-turn and vision workflows, see Built-in Workflows.

Sources: areal/api/workflow_api.py1-114 areal/api/engine_api.py197-203


Workflow Concept and Interface

What is a Workflow

A workflow in AReaL defines the complete procedure for generating a single training episode, from initial prompt construction through final reward computation. Each workflow implements an arun_episode method that:

  1. Constructs prompts from input data.
  2. Calls the inference engine to generate responses.
  3. Computes rewards based on the generated output.
  4. Assembles the result into a trajectory dictionary.

Workflows are the primary abstraction for defining task-specific behavior in AReaL's RL training pipeline.

RolloutWorkflow Interface

The abstract RolloutWorkflow class defines the core workflow interface:



























ComponentTypeDescription
engineInferenceEngineThe inference engine for generating responses areal/api/workflow_api.py29-30
datadict[str, Any]Input data for the episode (e.g., prompt, ground truth) areal/api/workflow_api.py31-32
Returndict[str, Any] | NoneTrajectory dictionary with tensors, or None to reject areal/api/workflow_api.py34-37

Key behaviors:

Sources: areal/api/workflow_api.py14-39 areal/workflow/rlvr.py138-177

Episode Structure

The trajectory dictionary returned by arun_episode must contain specific tensor fields:

FieldShapeTypeDescription
input_ids[batch, seq_len]torch.int32Full sequence (prompt + completion)
loss_mask[batch, seq_len]torch.int320 for prompt tokens, 1 for output tokens
logprobs[batch, seq_len]torch.float32Log probabilities (0.0 for prompt tokens)
versions[batch, seq_len]torch.int32Model version used for each token
rewards[batch]torch.float32Scalar reward for the episode
attention_mask[batch, seq_len]torch.boolAttention mask (typically all 1s)

For vision workflows, an additional multi_modal_input field contains image tensors and grid metadata areal/workflow/vision_rlvr.py146-152

Sources: areal/workflow/rlvr.py169-177 areal/workflow/vision_rlvr.py154-162


Built-in Workflow Implementations

Workflow Hierarchy Diagram


Sources: areal/api/workflow_api.py14-103 areal/workflow/rlvr.py48-177 areal/workflow/vision_rlvr.py26-162 areal/workflow/multi_turn.py18-137 examples/tir/tir_workflow.py47-108

RLVRWorkflow (Single-turn)

RLVRWorkflow implements single-turn reinforcement learning with verifiable rewards. It supports optional "thinking" tokens via chat templates areal/workflow/rlvr.py49

ParameterTypeDescription
reward_fnCallable | strReward function or import path areal/workflow/rlvr.py62
gconfigGenerationHyperparametersGeneration configuration areal/workflow/rlvr.py54
tokenizerPreTrainedTokenizerFast | strTokenizer or model ID areal/workflow/rlvr.py55
enable_thinkingboolEnable thinking tokens (default: False) areal/workflow/rlvr.py56

Execution flow:

  1. Tokenize prompt using get_input_ids_fn areal/workflow/rlvr.py146-150
  2. Create ModelRequest with generation config areal/workflow/rlvr.py151-156
  3. Call engine.agenerate() via _collect_samples to generate response areal/workflow/rlvr.py130
  4. Decode output and compute reward via reward_fn areal/workflow/rlvr.py132
  5. Assemble trajectory tensors with batch dimension 1 areal/workflow/rlvr.py169-177

Sources: areal/workflow/rlvr.py48-177

VisionRLVRWorkflow (Multi-modal)

VisionRLVRWorkflow extends RLVRWorkflow for vision-language models. It processes image inputs using a processor and constructs multi-modal requests areal/workflow/vision_rlvr.py26

ParameterTypeDescription
processorAutoProcessor | strVision processor for image encoding areal/workflow/vision_rlvr.py31

Key differences from RLVRWorkflow:

Sources: areal/workflow/vision_rlvr.py26-162

MultiTurnWorkflow (Retry)

MultiTurnWorkflow implements multi-turn dialog with automatic retry until a correct answer is produced or max turns is reached areal/workflow/multi_turn.py18

ParameterTypeDescription
max_turnsintMaximum retry attempts areal/workflow/multi_turn.py26
turn_discountfloatDiscount factor per turn (0, 1] areal/workflow/multi_turn.py27

Execution flow:

  1. Generate initial response areal/workflow/multi_turn.py86
  2. Compute reward (typically 0 for incorrect, 1 for correct) areal/workflow/multi_turn.py90-96
  3. If incorrect and turns remain:
  4. Concatenate all turns into single trajectory with unified loss_mask areal/workflow/multi_turn.py105-108

Sources: areal/workflow/multi_turn.py18-137

Specialized Agentic Workflows

AReaL supports complex agentic behaviors through specialized workflows:

  • TIRWorkflow (Tool-Integrated Reasoning): Manages multi-turn tool calling (e.g., Python execution). It uses a ToolManager to parse tool calls from model output and feed results back as new turns examples/tir/tir_workflow.py47-78
  • ScaffoldingWorkflow: A modular framework where generation and reward logic are decoupled into Controller and Worker objects. It supports advanced scenarios like LLM-as-a-judge rewards and complex search-agent tool loops examples/scaffolding/workflow.py86-130

Episode Execution Flow

Episode Lifecycle Diagram


Sources: areal/workflow/rlvr.py111-136 areal/api/engine_api.py197-203

Code Entity Space Bridge

The following diagram maps high-level rollout concepts to specific code entities and data structures used during execution.


Sources: areal/api/engine_api.py10-50 areal/utils/hf_utils.py36-47 areal/workflow/rlvr.py100-107 areal/api/workflow_api.py42-70

Session Context and Phase Tracking

Workflows use session context managers and decorators to track execution phases automatically for profiling areal/workflow/rlvr.py82-136:


Session tracking features:

  • @session_context(): Associates the current async context with a unique session ID.
  • @trace_session(phase): Automatically logs the start and end of a specific phase (e.g., "reward") areal/workflow/rlvr.py83
  • atrace_session_phase(phase): Async context manager for manual phase boundary marking areal/workflow/rlvr.py130

Sources: areal/workflow/rlvr.py82-136


Integration with System Components

Trajectory Assembly

Workflows construct trajectory dictionaries by concatenating prompt and completion tokens and aligning metadata areal/workflow/rlvr.py164-177:


Key invariants:

Sources: areal/workflow/rlvr.py164-177

WorkflowLike Type Alias

Workflows can be specified in multiple ways via the WorkflowLike type alias areal/api/workflow_api.py110-115:


Sources: areal/api/workflow_api.py110-115