Last indexed: 7 May 2026 (2e12c1)

Built-in Workflows

This page documents the workflow implementations included with AReaL. Workflows implement the RolloutWorkflow interface areal/api/workflow_api.py12-16 and define the episode generation logic for different task types. AReaL provides primary built-in workflows: RLVRWorkflow for single-turn text tasks, VisionRLVRWorkflow for vision-language tasks, and MultiTurnWorkflow for multi-attempt retry logic. It also includes the TIRWorkflow for tool-integrated reasoning and the ScaffoldingWorkflow for modular agentic RL.

RLVRWorkflow

The RLVRWorkflow is the standard implementation for single-turn reinforcement learning from verbal reinforcement (RLVR). It supports optional "thinking" tokens (reasoning) and handles the standard generate-reward cycle areal/workflow/rlvr.py49-56

Implementation Logic

Input Processing: Uses get_input_ids_fn to convert raw data into tokenized input IDs using a chat template. By default, it uses default_get_input_ids_fn which calls apply_chat_template areal/workflow/rlvr.py31-42 areal/workflow/rlvr.py147-151
Generation: Submits a ModelRequest areal/api/inference_api.py34-47 to the InferenceEngine via engine.agenerate areal/workflow/rlvr.py131
Reward Computation: Decodes the response and calls an asynchronous reward function wrapped by AsyncRewardWrapper areal/workflow/rlvr.py100-107
Tensor Preparation: Packages the sequence, logprobs, loss masks, and rewards into a dictionary of tensors with a batch dimension of 1 for the trainer areal/workflow/rlvr.py165-178

RLVR Workflow Data Flow

The following diagram illustrates the data flow within RLVRWorkflow.arun_episode.

Sources: areal/workflow/rlvr.py49-178 areal/api/workflow_api.py12-38

VisionRLVRWorkflow

VisionRLVRWorkflow extends RLVRWorkflow to support Vision-Language Models (VLMs). It integrates a transformers.AutoProcessor to handle multi-modal inputs areal/workflow/vision_rlvr.py26-43

Key Features

Processor Integration: Initializes with an AutoProcessor to handle image and text interleaving areal/workflow/vision_rlvr.py32-43
Multi-modal Requests: Encodes images to base64 using image2base64 areal/workflow/vision_rlvr.py122 and populates image_data and vision_msg_vllm in the ModelRequest areal/workflow/vision_rlvr.py123-133
VLM Tensor Dict: Includes multi_modal_input (containing pixel_values and optionally image_grid_thw) in the returned dictionary for the training engine areal/workflow/vision_rlvr.py148-167

VLM Workflow Components

Class/Function	File	Role
`VisionRLVRWorkflow`	areal/workflow/vision_rlvr.py26	Main class for VLM rollout logic.
`image2base64`	areal/utils/image.py15	Utility to convert PIL images for inference backends.
`_collect_samples`	areal/workflow/vision_rlvr.py76	Orchestrates generation and reward within a session context.

Sources: areal/workflow/vision_rlvr.py1-168

MultiTurnWorkflow

The MultiTurnWorkflow implements a retry mechanism where the agent is prompted to correct its answer if the initial reward is zero areal/workflow/multi_turn.py19-20

Logic and Discounting

Retry Loop: Continues generating until a non-zero reward is achieved or max_turns is reached areal/workflow/multi_turn.py76-121
Correction Prompt: If an answer is wrong, it appends a system-like message ("Your answer is either wrong...") to the conversation history areal/workflow/multi_turn.py43-57
Reward Discounting: Applies a turn_discount (γ) for each additional turn taken, reducing the final reward value: reward = reward * (turn_discount ^ (turns-1)) areal/workflow/multi_turn.py118-120

Multi-turn Execution Trace

Sources: areal/workflow/multi_turn.py19-136

Tool-Integrated Reasoning (TIR) Workflow

The TIRWorkflow enables agents to use external tools (e.g., Python executors) during multi-turn reasoning episodes examples/tir/tir_workflow.py47-50

Architecture

Tool Manager: Manages tool execution, timeouts, and markers examples/tir/tir_workflow.py72-86
Multi-round Response: A loop that detects tool call markers (e.g., <tool_call>), executes the tool via ToolManager, and appends the result back to the context examples/tir/tir_workflow.py147-210
Cleanup: Ensures remote sandboxes or local resources are cleaned up via tool_manager.acleanup() examples/tir/tir_workflow.py145

Sources: examples/tir/tir_workflow.py47-210

ScaffoldingWorkflow

The ScaffoldingWorkflow provides a modular framework for composing RL workflows using high-level controllers and workers examples/scaffolding/workflow.py42-43

Components

SGLangWorker: Communicates with inference engines via OpenAI-compatible APIs examples/scaffolding/worker.py18
Trajectory Maker: Defines how the episode is structured (e.g., PipelineTrajectoryMaker for simple sequences or TraceTrajectoryMaker for complex agents) examples/scaffolding/controllers.py34-43
ScaffoldingLlm: Orchestrates the interaction between controllers and workers examples/scaffolding/_compat.py31

Scaffolding Entity Map

Sources: examples/scaffolding/workflow.py42-153 examples/scaffolding/search_scaffolding.py86-180

Performance and Session Tracing

All built-in workflows utilize AReaL's tracing system defined in areal/utils/perf_tracer.py to monitor execution phases.

@session_context(): Wraps a collection of phases (like generate and reward) into a single logical session areal/workflow/rlvr.py111 areal/workflow/vision_rlvr.py75
@trace_session("reward"): Explicitly marks the reward computation phase for performance analysis areal/workflow/rlvr.py83 areal/workflow/vision_rlvr.py45
atrace_session_phase("generate"): An async context manager used to time the inference call areal/workflow/rlvr.py130 areal/workflow/vision_rlvr.py94

Sources: areal/workflow/rlvr.py83-130 areal/workflow/vision_rlvr.py45-94

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/5.4-built-in-workflows