Last indexed: 7 May 2026 (2e12c1)

Scaffolding Framework

The Scaffolding Framework is a modular architecture within AReaL designed to simplify the composition of complex Reinforcement Learning (RL) workflows. It provides a structured pattern—Controller/Worker/ScaffoldingLlm—to manage interactions between models, environments, and reward systems. This framework is particularly useful for tasks involving multi-step reasoning, tool usage, or specialized reward logic that goes beyond simple sequence-to-sequence completion.

Architecture Overview

The framework separates the concerns of trajectory generation, environmental interaction, and reward calculation into distinct entities. This modularity allows developers to reuse components across different RL experiments.

Key Components

Component	Role	Code Entity
Workflow	Orchestrates high-level episode logic and integrates with AReaL's training pipeline.	`ScaffoldingWorkflow` examples/scaffolding/workflow.py86-110
Controller	Logic for managing episode state, such as `SearchAgentController` for tool loops.	`SearchAgentController` examples/scaffolding/search_scaffolding.py166-171
Worker	Wraps an inference engine to provide an OpenAI-compatible API for generating responses.	`ScaffoldingLlm` examples/scaffolding/search_scaffolding.py136-151
Trajectory Maker	Converts raw interactions into tensors formatted for AReaL's trainers.	`TraceTrajectoryMaker` examples/scaffolding/search_scaffolding.py173-176
LLM Judge	Specialized controller for using LLMs to evaluate and score agent responses.	`LLMJudgeController` examples/scaffolding/search_scaffolding.py162-164

Data Flow and Interaction

The following diagram illustrates how the Scaffolding Framework bridges the gap between high-level workflow logic and the low-level AReaL engine APIs, specifically for agentic tasks like web search.

Scaffolding Framework Component Interaction

Sources: examples/scaffolding/search_scaffolding.py86-110 examples/scaffolding/search_scaffolding.py136-180

Implementation Details

Workflow Integration

The ScaffoldingWorkflow implements the RolloutWorkflow interface examples/scaffolding/workflow.py42 Its primary responsibility is to instantiate the scaffolding components and execute the arun_episode loop.

In a typical implementation, the workflow:

Receives a batch of data from the Trainer.
Initializes a ScaffoldingLlm via build_scaffolding_llm examples/scaffolding/search_scaffolding.py136-151
Configures controllers like NativeGenerationController for sampling parameters examples/scaffolding/search_scaffolding.py159-161
Delegates the episode to the trajectory maker to produce standard AReaL training tensors.

Worker and Engine Interaction

The scaffolding system uses ScaffoldingLlm to abstract the InferenceEngine. It maps natural language interactions to the underlying engine's generation capabilities, often using stop strings to manage multi-turn tool-calling boundaries.

Engine to Worker Mapping

Sources: examples/scaffolding/search_scaffolding.py152-161 examples/scaffolding/search_scaffolding.py178-180

Reward and Trajectory Management

The LLMJudgeController coordinates the call to reward functions using the LLM itself as a judge examples/scaffolding/search_scaffolding.py162-164 The TraceTrajectoryMaker is critical for ensuring that multi-turn data collected during the scaffolding process is correctly aligned for the training step examples/scaffolding/search_scaffolding.py173-176

Logprob Handling: Because external controllers or OpenAI-compatible APIs may not return per-token logprobs in the specific format required by AReaL's training backends, the scaffolding framework often uses placeholder logprobs during rollout.

Important: It is recommended to set recompute_logprob: true in the actor configuration so the training engine recomputes exact logprobs during the PPO/GRPO update.

Example: Search Agent Scaffolding

In the search-agent pattern, the framework is used to implement a deep research assistant that conducts multi-source investigations.

Step	Entity	Action
System Prompt	`SYSTEM_PROMPT`	Defines tool signatures (search/visit) in XML tags examples/scaffolding/search_scaffolding.py50-83
Generation	`SearchAgentController`	Drives the tool-calling loop and manages token budgets examples/scaffolding/search_scaffolding.py166-171
Scoring	`LLMJudgeController`	Uses an LLM judge to determine answer correctness examples/scaffolding/search_scaffolding.py162-164
Tracing	`TraceTrajectoryMaker`	Traces each LLM call for PPO training trajectory construction examples/scaffolding/search_scaffolding.py173-176

Sources: examples/scaffolding/search_scaffolding.py21-47 examples/scaffolding/search_scaffolding.py136-180

Performance Tracing

The Scaffolding Framework components leverage AReaL's PerfTracer and SessionTracer to monitor rollout health. Key phases are wrapped in trace decorators in the underlying RLVRWorkflow logic that scaffolding often inherits or mimics:

@trace_session("reward"): Tracks the time taken to compute rewards areal/workflow/rlvr.py83
@session_context(): Manages the lifecycle of a rollout session areal/workflow/rlvr.py111
atrace_session_phase("generate"): Tracks the duration of the inference engine call areal/workflow/rlvr.py130

Sources: areal/workflow/rlvr.py83-137 areal/workflow/vision_rlvr.py45-101

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/14.12-scaffolding-framework