VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/14.7-tool-integrated-reasoning-(tir)

⇱ Tool-Integrated Reasoning (TIR) | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Tool-Integrated Reasoning (TIR)

Tool-Integrated Reasoning (TIR) in AReaL enables agents to perform multi-turn reasoning by interacting with external tools such as Python executors and calculators. This capability is essential for tasks requiring precise computation or symbolic manipulation that LLMs cannot reliably perform internally. The TIR system is built around the TIRWorkflow, which manages the iterative process of generation, tool call parsing, execution, and feedback integration.

TIR Architecture Overview

The TIR architecture bridges the gap between the LLM's natural language generation and deterministic code execution. It consists of three primary layers: the Workflow Layer (TIRWorkflow), the Management Layer (ToolManager), and the Execution Layer (BaseTool implementations).

System Data Flow and Entity Mapping

The following diagram illustrates how high-level reasoning steps map to specific code entities within the AReaL framework.

Diagram: TIR Natural Language to Code Entity Mapping


Sources: examples/tir/tir_workflow.py101-143 examples/tir/tool_manager.py134-168 examples/tir/tools/python_tool.py65-77 examples/tir/tests/test_tir.py31-60


Tool Management and Execution

The ToolManager acts as the central registry and dispatcher for all available tools. It encapsulates the logic for identifying tool triggers within model output and routing them to the correct executor.

Key Components

Class/EntityFile PathResponsibility
BaseToolexamples/tir/tools/base.py58-60Abstract base class defining the interface for tool execution.
ToolCallStatusexamples/tir/tools/base.py14-19Enum defining tool execution outcomes (SUCCESS, ERROR, NOT_FOUND).
PythonExecutorexamples/tir/tools/python_tool.py65-77Executes arbitrary Python code snippets using a GenericRuntime.
CalculatorToolexamples/tir/tools/calculator_tool.py11-33Specialized tool for basic arithmetic using safe evaluation patterns.
ToolManagerexamples/tir/tool_manager.py187-204Manages tool lifecycles, including acleanup for sandbox environments.
TIRConfigexamples/tir/tir_workflow.py33-40Configuration for turns, timeouts, and tool availability.
DaytonaPythonToolexamples/tir/tools/daytona_python_tool.py1-20Optional cloud sandbox backend for persistent state across tool calls.

Tool Execution Flow

The TIRWorkflow orchestrates the interaction loop. Unlike standard RLVRWorkflow implementations, TIRWorkflow handles iterative generation until max_turns or a final answer is detected.

Diagram: TIR Multi-Turn Interaction Flow


Sources: examples/tir/tir_workflow.py147-210 examples/tir/tools/python_tool.py26-61 examples/tir/tools/python_tool.py161-180 examples/tir/tests/test_tir.py31-60 examples/tir/tool_manager.py214-230


Implementation Details

Python Tool Integration

The PythonExecutor utilizes a modified version of the qwen_agent tool architecture examples/tir/tools/python_tool.py65-77 It supports extraction via regex for both Markdown (```python) and XML-style (<python>) tags examples/tir/tools/python_tool.py26-61

The execution happens in a ProcessPool with a timeout_length to prevent blocking the training loop examples/tir/tools/python_tool.py168-179 It captures stdout using io.StringIO and redirect_stdout to return program output to the model examples/tir/tools/python_tool.py131-136

Workflow and Episode Management

The TIRWorkflow.arun_episode method ensures that tools are cleaned up after every episode via tool_manager.acleanup(), which is critical when using remote sandboxes like Daytona examples/tir/tir_workflow.py101-145 It dynamically builds a system prompt containing tool descriptions provided by the ToolRegistry examples/tir/tool_manager.py138-148

Persistent State with Daytona

When daytona_python is enabled in tir_config.yaml, the agent uses a cloud-based sandbox that maintains Python state (variables, imports) across multiple tool calls within a single reasoning trajectory examples/tir/README.md209-230 This is managed via the DaytonaPythonTool which requires the sandbox extra dependency examples/tir/tool_manager.py21-29

Reward Scoring and Data

In TIR training, rewards are typically calculated based on the correctness of the final answer extracted from the reasoning trace.

FunctionFile PathRole
math_reward_fnexamples/tir/train_tir.py15Main entry point for TIR math rewards, comparing model output to ground truth.
extract_python_codeexamples/tir/tools/python_tool.py26-61Regex-based utility to isolate code blocks from model generation.
get_torl_data_rl_datasetareal/dataset/torl_data.py71-88Loads and formats the ToRL dataset for TIR training, wrapping answers in \boxed{}.
get_gsm8k_rl_datasetareal/dataset/gsm8k.py31-49Prepares GSM8K data with instructions for the model to use tools and format answers.

Sources: examples/tir/tir_workflow.py79-82 examples/tir/tools/python_tool.py26-61 areal/dataset/torl_data.py81-85 examples/tir/README.md105-112