Last indexed: 7 May 2026 (2e12c1)

Terminal Bench Agent Training

This page details the implementation and configuration for training terminal agents using the Terminal Bench benchmark within the AReaL framework. This integration enables Reinforcement Learning (RL) on agents that interact with real operating system environments via a terminal, utilizing Docker for task isolation and CAMEL for agent scaffolding.

The Terminal Bench implementation in AReaL is located in examples/terminal_bench/ and is designed to scale across GPU and NPU (Ascend) clusters using PPO or GRPO algorithms examples/terminal_bench/README.md5-11

System Architecture

The Terminal Bench training pipeline bridges high-level agent logic with low-level environment execution. It uses a specialized RolloutWorkflow to manage the lifecycle of Docker-based task environments.

Core Components

Component	Code Entity	Responsibility
Training Entry	`train.py`	Entry point that loads config, builds the dataset, and launches AReaL training examples/terminal_bench/README.md35-36
Rollout Workflow	`CamelRLVRWorkflow`	Rollout workflow that builds task images, runs trajectories, and collects rewards examples/terminal_bench/README.md37-39
Agent Wrapper	`CamelTerminalAgent`	Manages the CAMEL `ChatAgent`, tool definitions, and task-specific environment resets examples/terminal_bench/agent/camel_terminal_agent.py38-50
Environment	`Terminal` / `DockerComposeManager`	Provides the interface to the Dockerized terminal and manages container lifecycle examples/terminal_bench/agent/camel_terminal_agent.py19-20
Traced Agent	`ChatAgentTrace`	A `ChatAgent` subclass with performance tracing and JSON parse error handling examples/terminal_bench/agent/chat_agent_trace.py67-73

Data Flow and Entity Mapping

The following diagram shows how AReaL's training abstractions map to the Terminal Bench execution entities.

Entity Interaction Diagram

Sources: examples/terminal_bench/README.md33-46 examples/terminal_bench/agent/camel_terminal_agent.py38-72 examples/terminal_bench/agent/chat_agent_trace.py67-73

Agent Implementation

The agent logic relies on the CamelTerminalAgent class, which handles the transition from natural language instructions to terminal actions.

Task Execution Lifecycle

The run_agent method follows a strict sequence to ensure environment consistency:

Environment Reset: Calls _reset_env to spin up a fresh Docker container via DockerComposeManager. This step includes a timeout defined in TaskTimeouts examples/terminal_bench/agent/camel_terminal_agent.py84-93
Agent Initialization: Configures the ChatAgentTrace with a system prompt and tools via _reset_agent examples/terminal_bench/agent/camel_terminal_agent.py96-104
Step Execution: Invokes agent.astep(prompt) where the agent generates thoughts and tool calls examples/terminal_bench/agent/camel_terminal_agent.py111
Reward Evaluation: Executes _evaluate_completion_sync to verify if the task goal was achieved and sets the reward in the client examples/terminal_bench/agent/camel_terminal_agent.py133-138
Cleanup: Shuts down the Docker environment using _close_env examples/terminal_bench/agent/camel_terminal_agent.py150-157

Performance Tracing and Error Handling

ChatAgentTrace extends the standard CAMEL agent to include:

JSON Parse Error Detection: The adetect_tool_calls_parse_error method identifies malformed tool calls (e.g., from Qwen models). It creates an error tool calling record, allowing the agent to see the error and attempt self-correction examples/terminal_bench/agent/chat_agent_trace.py76-140
Memory Updates: When errors occur, the agent's memory is updated with FunctionCallingMessage objects representing the failure examples/terminal_bench/agent/chat_agent_trace.py153-184
Scope Tracing: Uses atrace_scope and atrace_session_phase to log the duration of resets, agent steps, and evaluations for performance analysis examples/terminal_bench/agent/camel_terminal_agent.py84-130

Sources: examples/terminal_bench/agent/camel_terminal_agent.py66-162 examples/terminal_bench/agent/chat_agent_trace.py76-188

Configuration and NPU Support

AReaL supports both NVIDIA GPUs (via SGLang) and Ascend NPUs (via vLLM) for Terminal Bench training.

Configuration Structure

The configuration extends GRPOConfig with terminal-specific parameters defined in AgentRLConfig examples/terminal_bench/agent_rl_config.py16-26

Parameter	Description
`n_trajs`	Number of trajectories per task examples/terminal_bench/agent_rl_config.py17
`max_iteration`	Maximum number of tool-use turns examples/terminal_bench/agent_rl_config.py19
`non_think_mode`	Toggle for specific prompt formatting (e.g., for non-reasoning models) examples/terminal_bench/agent_rl_config.py21
`task_timeouts`	Dataclass defining limits for `_reset_env` (1800s), `agent_astep` (300s), and `_evaluate` (1200s) examples/terminal_bench/agent_rl_config.py7-12

NPU Integration

For Ascend NPU clusters, the config_tb_vllm_npu.yaml uses the vllm backend. It requires specific environment variables for HCCL and ACL stability, such as HCCL_EXEC_TIMEOUT and ACL_DEVICE_SYNC_TIMEOUT examples/terminal_bench/config_tb_vllm_npu.yaml89-100

NPU-Specific vLLM Config

Sources: examples/terminal_bench/config_tb_vllm_npu.yaml127-134 examples/terminal_bench/agent_rl_config.py6-26

Environment Setup

The Terminal Bench tasks require a host Docker socket mounted into the AReaL runtime container. This allows the trainer (running inside a container) to spawn task containers on the host examples/terminal_bench/README.md79-84

Recommended Mounts

Sources: examples/terminal_bench/README.md95-99

Dependency Management

The project uses a dedicated pyproject.toml for terminal-specific dependencies including camel-ai and terminal-bench examples/terminal_bench/pyproject.toml10-18

Terminal Agent Workflow Sequence

Sources: examples/terminal_bench/agent/camel_terminal_agent.py66-162 examples/terminal_bench/README.md33-46

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/14.13-terminal-bench-agent-training