VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/14.12-scaffolding-framework

⇱ Scaffolding Framework | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Scaffolding Framework

The Scaffolding Framework is a modular architecture within AReaL designed to simplify the composition of complex Reinforcement Learning (RL) workflows. It provides a structured pattern—Controller/Worker/ScaffoldingLlm—to manage interactions between models, environments, and reward systems. This framework is particularly useful for tasks involving multi-step reasoning, tool usage, or specialized reward logic that goes beyond simple sequence-to-sequence completion.

Architecture Overview

The framework separates the concerns of trajectory generation, environmental interaction, and reward calculation into distinct entities. This modularity allows developers to reuse components across different RL experiments.

Key Components

ComponentRoleCode Entity
WorkflowOrchestrates high-level episode logic and integrates with AReaL's training pipeline.ScaffoldingWorkflow examples/scaffolding/workflow.py86-110
ControllerLogic for managing episode state, such as SearchAgentController for tool loops.SearchAgentController examples/scaffolding/search_scaffolding.py166-171
WorkerWraps an inference engine to provide an OpenAI-compatible API for generating responses.ScaffoldingLlm examples/scaffolding/search_scaffolding.py136-151
Trajectory MakerConverts raw interactions into tensors formatted for AReaL's trainers.TraceTrajectoryMaker examples/scaffolding/search_scaffolding.py173-176
LLM JudgeSpecialized controller for using LLMs to evaluate and score agent responses.LLMJudgeController examples/scaffolding/search_scaffolding.py162-164

Data Flow and Interaction

The following diagram illustrates how the Scaffolding Framework bridges the gap between high-level workflow logic and the low-level AReaL engine APIs, specifically for agentic tasks like web search.

Scaffolding Framework Component Interaction


Sources: examples/scaffolding/search_scaffolding.py86-110 examples/scaffolding/search_scaffolding.py136-180

Implementation Details

Workflow Integration

The ScaffoldingWorkflow implements the RolloutWorkflow interface examples/scaffolding/workflow.py42 Its primary responsibility is to instantiate the scaffolding components and execute the arun_episode loop.

In a typical implementation, the workflow:

  1. Receives a batch of data from the Trainer.
  2. Initializes a ScaffoldingLlm via build_scaffolding_llm examples/scaffolding/search_scaffolding.py136-151
  3. Configures controllers like NativeGenerationController for sampling parameters examples/scaffolding/search_scaffolding.py159-161
  4. Delegates the episode to the trajectory maker to produce standard AReaL training tensors.

Worker and Engine Interaction

The scaffolding system uses ScaffoldingLlm to abstract the InferenceEngine. It maps natural language interactions to the underlying engine's generation capabilities, often using stop strings to manage multi-turn tool-calling boundaries.

Engine to Worker Mapping


Sources: examples/scaffolding/search_scaffolding.py152-161 examples/scaffolding/search_scaffolding.py178-180

Reward and Trajectory Management

The LLMJudgeController coordinates the call to reward functions using the LLM itself as a judge examples/scaffolding/search_scaffolding.py162-164 The TraceTrajectoryMaker is critical for ensuring that multi-turn data collected during the scaffolding process is correctly aligned for the training step examples/scaffolding/search_scaffolding.py173-176

Logprob Handling: Because external controllers or OpenAI-compatible APIs may not return per-token logprobs in the specific format required by AReaL's training backends, the scaffolding framework often uses placeholder logprobs during rollout.

Important: It is recommended to set recompute_logprob: true in the actor configuration so the training engine recomputes exact logprobs during the PPO/GRPO update.

Example: Search Agent Scaffolding

In the search-agent pattern, the framework is used to implement a deep research assistant that conducts multi-source investigations.

StepEntityAction
System PromptSYSTEM_PROMPTDefines tool signatures (search/visit) in XML tags examples/scaffolding/search_scaffolding.py50-83
GenerationSearchAgentControllerDrives the tool-calling loop and manages token budgets examples/scaffolding/search_scaffolding.py166-171
ScoringLLMJudgeControllerUses an LLM judge to determine answer correctness examples/scaffolding/search_scaffolding.py162-164
TracingTraceTrajectoryMakerTraces each LLM call for PPO training trajectory construction examples/scaffolding/search_scaffolding.py173-176

Sources: examples/scaffolding/search_scaffolding.py21-47 examples/scaffolding/search_scaffolding.py136-180

Performance Tracing

The Scaffolding Framework components leverage AReaL's PerfTracer and SessionTracer to monitor rollout health. Key phases are wrapped in trace decorators in the underlying RLVRWorkflow logic that scaffolding often inherits or mimics:

Sources: areal/workflow/rlvr.py83-137 areal/workflow/vision_rlvr.py45-101