Last indexed: 7 May 2026 (2e12c1)

OpenClaw and External Agent Runtimes

The OpenClaw and External Agent Runtime integration allows AReaL to function as a high-performance backend for complex, multi-turn agentic workflows. Instead of defining agent logic within AReaL's Python process, AReaL exposes an OpenAI-compatible proxy gateway. This allows external runtimes—such as OpenClaw, ZeroClaw, or custom human-in-the-loop interfaces—to interact with the model while AReaL transparently captures trajectories, log-probabilities, and rewards for Reinforcement Learning (RL) training examples/openclaw/README.md3-7

System Architecture and Data Flow

The architecture centers around the Proxy Gateway, which acts as a middleman between the external agent runtime and the AReaL inference engines. This setup is specifically designed for the online rollout mode examples/openclaw/config.yaml35

Key Components

Proxy Gateway: A FastAPI-based server that routes requests from external applications to backend workers. It handles session management, authentication, and load balancing examples/openclaw/README.md55-57
Proxy Workers: Backend components colocated with rollout workers. Each worker manages sessions, records token-level data (logprobs), and exports trajectories for training examples/openclaw/README.md161-163
External Agent Runtime: Any system (e.g., ZeroClaw) that communicates via the OpenAI chat-completions protocol examples/openclaw/README.md3-6
Inference Servers: Backend engines (SGLang or vLLM) that perform the actual LLM inference and provide log-probabilities examples/openclaw/config.yaml99-115

Request Lifecycle Diagram

This diagram illustrates how a request from an external agent is processed and captured for RL training, mapping system concepts to specific code entities.

Title: External Agent Interaction Flow (Code Mapping)

Sources: examples/openclaw/README.md161-163 examples/openclaw/config.yaml91-96 examples/openclaw/README.md55-60

Session Management and Lifecycle

Unlike standard stateless LLM APIs, the Proxy Gateway maintains session state to support RL. A "Session" corresponds to one RL episode or trajectory examples/openclaw/README.md83-85

Session Transitions

Start Session: An admin initiates a session via /rl/start_session (triggered by start_session.py). This returns a unique session_id and a api_key for the agent examples/openclaw/README.md91-108
Interaction: The external agent uses the session API key. AReaL tracks these interactions, storing logprobs and model versions examples/openclaw/README.md161-163
Reward Assignment: Rewards are assigned via /rl/set_reward (triggered by set_reward.py). This provides the scalar signal for the RL algorithm examples/openclaw/README.md183-185
Session Refresh: Calling start_session.py with an existing api_key triggers a refresh. The gateway ends the old session, exports the trajectory to the RL trainer, and starts a fresh session bound to the same key examples/openclaw/README.md197-205

Title: Session State and Interaction Tracking

Sources: examples/openclaw/README.md197-205 examples/openclaw/set_reward.py48-51 examples/openclaw/start_session.py101-107

Implementation and Configuration

Configuration Parameters

The rollout.agent section in config.yaml controls the proxy behavior for agentic RL.

Parameter	Description
`mode`	Set to `online` for external runtime support examples/openclaw/config.yaml35
`admin_api_key`	Secret key for managing sessions and starting episodes examples/openclaw/config.yaml40
`tool_call_parser`	Parser for extracting tool calls (e.g., `qwen`) examples/openclaw/config.yaml36
`turn_discount`	Discount factor for multi-turn reward propagation examples/openclaw/config.yaml39
`export_style`	Style for exporting trajectories (`individual` or `concat`) examples/openclaw/config.yaml38

Authentication Tiers

The system uses a two-tier authentication model:

Admin API Key: Configured in YAML via rollout.agent.admin_api_key. Used for administrative endpoints like /rl/start_session examples/openclaw/README.md49
Session API Key: Issued dynamically by the gateway. Used as a Bearer token for standard OpenAI-compatible endpoints (/chat/completions) and /rl/set_reward examples/openclaw/README.md115-122

Interaction Handling and Trajectories

AReaL ensures that interactions within a session are correctly associated even in complex multi-turn scenarios.

Trajectory Collection: RL training requires (input, output, reward) tuples. An episode may contain multiple LLM interactions, which the gateway collects into a single trajectory examples/openclaw/README.md83-85
Model Updates: After every training step, the updated model weights are synchronized. The proxy gateway ensures the agent always interacts with the latest policy examples/openclaw/README.md161-163
Concurrency: Multiple agent sessions can run concurrently. The unique session API keys allow the gateway to differentiate trajectories and route rewards to the correct sequence examples/openclaw/README.md152-157

Title: Natural Language Space to Code Entity Mapping

Sources: examples/openclaw/README.md83-85 examples/openclaw/config.yaml34-40 examples/openclaw/README.md115-122

Sources:

examples/openclaw/README.md
examples/openclaw/config.yaml
examples/openclaw/start_session.py
examples/openclaw/set_reward.py
examples/agent_workflow/README.md

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/14.10-openclaw-and-external-agent-runtimes