Last indexed: 7 May 2026 (2e12c1)

Implementing Custom Workflows

This page provides a practical guide for creating custom RolloutWorkflow implementations in AReaL. For the base API specification, see RolloutWorkflow API (5.1) For built-in workflow implementations, see Built-in Workflows (5.4)

Purpose and Scope

Custom workflows define task-specific data collection logic for reinforcement learning. They control how prompts are formatted, how model responses are generated, how rewards are computed, and how trajectories are formatted for training.

This page covers:

The WorkflowLike type and workflow variants (RolloutWorkflow, duck-typed patterns)
The duck-typed agent workflow pattern for agentic RL
Step-by-step implementation guide for arun_episode
Performance tracing within workflows
Built-in implementations for vision, multi-turn, and tool-integrated reasoning (TIR) tasks
The Scaffolding framework for modular workflow composition

Sources: areal/api/workflow_api.py12-113 areal/workflow/rlvr.py49-178

Workflow Types and the WorkflowLike Protocol

AReaL supports multiple workflow patterns through the WorkflowLike type alias. This allows the system to accept formal class implementations, string import paths, or simple duck-typed objects.

Workflow Type Hierarchy and Code Entities

RolloutWorkflow Pattern

Traditional pattern for explicit trajectory collection:

Implements async def arun_episode(engine: InferenceEngine, data: dict) -> dict[str, Any] | None areal/api/workflow_api.py14-16
Direct control over tokenization, generation, and reward computation.
Returning None implies the trajectory is rejected and will not be used for training areal/api/workflow_api.py19-21
Defined in areal/api/workflow_api.py12-37

Agent Workflow Pattern (Duck-Typed)

High-level pattern for agent framework integration (e.g., using OpenAI SDK):

Implements async def run(data: dict, **extra_kwargs) -> float | dict[str, float] areal/api/workflow_api.py75-77
Receives base_url, http_client, and api_key in extra_kwargs for OpenAI-compatible communication with the AReaL proxy areal/api/workflow_api.py84-91
Note: The AgentWorkflow base class is deprecated areal/api/workflow_api.py61-68 Any class with a compatible run() method is recognized as WorkflowLike areal/api/workflow_api.py112-113

Sources: areal/api/workflow_api.py12-113 areal/workflow/openai/math_agent.py27-48

Implementation Overview

Custom workflow implementation follows a standard pattern of initialization and episode execution. The workflow acts as the orchestrator between the input data and the inference engine.

Workflow Implementation Pattern

Sources: areal/api/workflow_api.py12-37 areal/workflow/rlvr.py139-178

Step-by-Step Implementation Guide

Step 1: Define Workflow Class

Create a class that inherits from RolloutWorkflow areal/api/workflow_api.py12

Step 2: Implement `arun_episode`

The arun_episode method orchestrates the interaction with the InferenceEngine areal/api/workflow_api.py14-16 Implementations typically decode prompts, call agenerate, and compute rewards areal/workflow/rlvr.py139-162

Sources: areal/api/workflow_api.py12-37 areal/workflow/rlvr.py139-178

Step 3: Format Trajectory Tensors

The workflow must return a dictionary containing specific tensors. Tensors must be unsqueezed to provide a batch dimension areal/workflow/rlvr.py178

Key	Description
`input_ids`	Concatenated prompt and response tokens areal/workflow/rlvr.py171
`logprobs`	Log probabilities of the generated tokens areal/workflow/rlvr.py173
`loss_mask`	Mask indicating which tokens contribute to loss areal/workflow/rlvr.py172
`rewards`	Scalar reward for the episode areal/workflow/rlvr.py176
`versions`	Weight versions used for each token generation areal/workflow/rlvr.py174
`attention_mask`	Boolean mask for sequence length areal/workflow/rlvr.py175

Sources: areal/workflow/rlvr.py170-178 areal/workflow/vision_rlvr.py154-162

Performance Tracing and Monitoring

Workflows should use the PerfTracer and SessionTracer to provide visibility into execution phases like generation and reward calculation.

@session_context(): Marks the start of a rollout session areal/workflow/rlvr.py111
@trace_session("reward"): Categorizes a method as part of the "reward" phase areal/workflow/rlvr.py83
atrace_session_phase("generate"): Async context manager for the "generate" phase areal/workflow/rlvr.py130

Workflow Performance Instrumentation

Sources: areal/workflow/rlvr.py83-137 areal/workflow/vision_rlvr.py45-51

Implementing Duck-Typed Agent Workflows

The duck-typed agent pattern allows integrating external agent frameworks. The workflow class simply needs an async run method areal/api/workflow_api.py75-77

Sources: areal/workflow/openai/math_agent.py27-48 areal/workflow/anthropic/math_agent.py17-81

Specialized Workflow Examples

Vision RLVR Workflow

Extends RLVRWorkflow to handle multi-modal inputs. It uses an AutoProcessor to encode images and text areal/workflow/vision_rlvr.py43 The arun_episode implementation constructs a ModelRequest with image_data areal/workflow/vision_rlvr.py122-132 and returns a multi_modal_input key in the result dictionary areal/workflow/vision_rlvr.py158

Multi-Turn Workflow

Orchestrates multiple attempts at solving a task. If the reward is 0.0, it appends a "retry" prompt (multi_turn_prompt_ids) and calls the engine again until max_turns is reached or a positive reward is achieved areal/workflow/multi_turn.py76-120 It applies a turn_discount to the final reward based on the number of attempts areal/workflow/multi_turn.py120

Tool-Integrated Reasoning (TIR) Workflow

Supports complex multi-turn interactions involving tool calls (e.g., Python execution). It uses a ToolManager to parse tool markers and execute code examples/tir/tir_workflow.py72-83 The arun_episode method runs a _multi_round_response loop that continues until a final answer is detected or the turn limit is reached examples/tir/tir_workflow.py101-143

Scaffolding Framework

A modular framework for composing RL workflows using a Controller/Worker/ScaffoldingLlm pattern examples/scaffolding/workflow.py4-12 It uses an SGLangWorker to call OpenAI-compatible APIs examples/scaffolding/worker.py101-108 and a PipelineTrajectoryMaker to orchestrate generation and reward controllers examples/scaffolding/workflow.py146-148

Sources: areal/workflow/vision_rlvr.py103-162 areal/workflow/multi_turn.py58-135 examples/tir/tir_workflow.py101-190 examples/scaffolding/workflow.py42-152

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/5.2-implementing-custom-workflows