VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/14.13-terminal-bench-agent-training

⇱ Terminal Bench Agent Training | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Terminal Bench Agent Training

This page details the implementation and configuration for training terminal agents using the Terminal Bench benchmark within the AReaL framework. This integration enables Reinforcement Learning (RL) on agents that interact with real operating system environments via a terminal, utilizing Docker for task isolation and CAMEL for agent scaffolding.

The Terminal Bench implementation in AReaL is located in examples/terminal_bench/ and is designed to scale across GPU and NPU (Ascend) clusters using PPO or GRPO algorithms examples/terminal_bench/README.md5-11

System Architecture

The Terminal Bench training pipeline bridges high-level agent logic with low-level environment execution. It uses a specialized RolloutWorkflow to manage the lifecycle of Docker-based task environments.

Core Components

ComponentCode EntityResponsibility
Training Entrytrain.pyEntry point that loads config, builds the dataset, and launches AReaL training examples/terminal_bench/README.md35-36
Rollout WorkflowCamelRLVRWorkflowRollout workflow that builds task images, runs trajectories, and collects rewards examples/terminal_bench/README.md37-39
Agent WrapperCamelTerminalAgentManages the CAMEL ChatAgent, tool definitions, and task-specific environment resets examples/terminal_bench/agent/camel_terminal_agent.py38-50
EnvironmentTerminal / DockerComposeManagerProvides the interface to the Dockerized terminal and manages container lifecycle examples/terminal_bench/agent/camel_terminal_agent.py19-20
Traced AgentChatAgentTraceA ChatAgent subclass with performance tracing and JSON parse error handling examples/terminal_bench/agent/chat_agent_trace.py67-73

Data Flow and Entity Mapping

The following diagram shows how AReaL's training abstractions map to the Terminal Bench execution entities.

Entity Interaction Diagram


Sources: examples/terminal_bench/README.md33-46 examples/terminal_bench/agent/camel_terminal_agent.py38-72 examples/terminal_bench/agent/chat_agent_trace.py67-73

Agent Implementation

The agent logic relies on the CamelTerminalAgent class, which handles the transition from natural language instructions to terminal actions.

Task Execution Lifecycle

The run_agent method follows a strict sequence to ensure environment consistency:

  1. Environment Reset: Calls _reset_env to spin up a fresh Docker container via DockerComposeManager. This step includes a timeout defined in TaskTimeouts examples/terminal_bench/agent/camel_terminal_agent.py84-93
  2. Agent Initialization: Configures the ChatAgentTrace with a system prompt and tools via _reset_agent examples/terminal_bench/agent/camel_terminal_agent.py96-104
  3. Step Execution: Invokes agent.astep(prompt) where the agent generates thoughts and tool calls examples/terminal_bench/agent/camel_terminal_agent.py111
  4. Reward Evaluation: Executes _evaluate_completion_sync to verify if the task goal was achieved and sets the reward in the client examples/terminal_bench/agent/camel_terminal_agent.py133-138
  5. Cleanup: Shuts down the Docker environment using _close_env examples/terminal_bench/agent/camel_terminal_agent.py150-157

Performance Tracing and Error Handling

ChatAgentTrace extends the standard CAMEL agent to include:

Sources: examples/terminal_bench/agent/camel_terminal_agent.py66-162 examples/terminal_bench/agent/chat_agent_trace.py76-188

Configuration and NPU Support

AReaL supports both NVIDIA GPUs (via SGLang) and Ascend NPUs (via vLLM) for Terminal Bench training.

Configuration Structure

The configuration extends GRPOConfig with terminal-specific parameters defined in AgentRLConfig examples/terminal_bench/agent_rl_config.py16-26

ParameterDescription
n_trajsNumber of trajectories per task examples/terminal_bench/agent_rl_config.py17
max_iterationMaximum number of tool-use turns examples/terminal_bench/agent_rl_config.py19
non_think_modeToggle for specific prompt formatting (e.g., for non-reasoning models) examples/terminal_bench/agent_rl_config.py21
task_timeoutsDataclass defining limits for _reset_env (1800s), agent_astep (300s), and _evaluate (1200s) examples/terminal_bench/agent_rl_config.py7-12

NPU Integration

For Ascend NPU clusters, the config_tb_vllm_npu.yaml uses the vllm backend. It requires specific environment variables for HCCL and ACL stability, such as HCCL_EXEC_TIMEOUT and ACL_DEVICE_SYNC_TIMEOUT examples/terminal_bench/config_tb_vllm_npu.yaml89-100

NPU-Specific vLLM Config


Sources: examples/terminal_bench/config_tb_vllm_npu.yaml127-134 examples/terminal_bench/agent_rl_config.py6-26

Environment Setup

The Terminal Bench tasks require a host Docker socket mounted into the AReaL runtime container. This allows the trainer (running inside a container) to spawn task containers on the host examples/terminal_bench/README.md79-84

Recommended Mounts


Sources: examples/terminal_bench/README.md95-99

Dependency Management

The project uses a dedicated pyproject.toml for terminal-specific dependencies including camel-ai and terminal-bench examples/terminal_bench/pyproject.toml10-18

Terminal Agent Workflow Sequence


Sources: examples/terminal_bench/agent/camel_terminal_agent.py66-162 examples/terminal_bench/README.md33-46