VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/14.10-openclaw-and-external-agent-runtimes

⇱ OpenClaw and External Agent Runtimes | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

OpenClaw and External Agent Runtimes

The OpenClaw and External Agent Runtime integration allows AReaL to function as a high-performance backend for complex, multi-turn agentic workflows. Instead of defining agent logic within AReaL's Python process, AReaL exposes an OpenAI-compatible proxy gateway. This allows external runtimes—such as OpenClaw, ZeroClaw, or custom human-in-the-loop interfaces—to interact with the model while AReaL transparently captures trajectories, log-probabilities, and rewards for Reinforcement Learning (RL) training examples/openclaw/README.md3-7

System Architecture and Data Flow

The architecture centers around the Proxy Gateway, which acts as a middleman between the external agent runtime and the AReaL inference engines. This setup is specifically designed for the online rollout mode examples/openclaw/config.yaml35

Key Components

  • Proxy Gateway: A FastAPI-based server that routes requests from external applications to backend workers. It handles session management, authentication, and load balancing examples/openclaw/README.md55-57
  • Proxy Workers: Backend components colocated with rollout workers. Each worker manages sessions, records token-level data (logprobs), and exports trajectories for training examples/openclaw/README.md161-163
  • External Agent Runtime: Any system (e.g., ZeroClaw) that communicates via the OpenAI chat-completions protocol examples/openclaw/README.md3-6
  • Inference Servers: Backend engines (SGLang or vLLM) that perform the actual LLM inference and provide log-probabilities examples/openclaw/config.yaml99-115

Request Lifecycle Diagram

This diagram illustrates how a request from an external agent is processed and captured for RL training, mapping system concepts to specific code entities.

Title: External Agent Interaction Flow (Code Mapping)


Sources: examples/openclaw/README.md161-163 examples/openclaw/config.yaml91-96 examples/openclaw/README.md55-60

Session Management and Lifecycle

Unlike standard stateless LLM APIs, the Proxy Gateway maintains session state to support RL. A "Session" corresponds to one RL episode or trajectory examples/openclaw/README.md83-85

Session Transitions

  1. Start Session: An admin initiates a session via /rl/start_session (triggered by start_session.py). This returns a unique session_id and a api_key for the agent examples/openclaw/README.md91-108
  2. Interaction: The external agent uses the session API key. AReaL tracks these interactions, storing logprobs and model versions examples/openclaw/README.md161-163
  3. Reward Assignment: Rewards are assigned via /rl/set_reward (triggered by set_reward.py). This provides the scalar signal for the RL algorithm examples/openclaw/README.md183-185
  4. Session Refresh: Calling start_session.py with an existing api_key triggers a refresh. The gateway ends the old session, exports the trajectory to the RL trainer, and starts a fresh session bound to the same key examples/openclaw/README.md197-205

Title: Session State and Interaction Tracking


Sources: examples/openclaw/README.md197-205 examples/openclaw/set_reward.py48-51 examples/openclaw/start_session.py101-107

Implementation and Configuration

Configuration Parameters

The rollout.agent section in config.yaml controls the proxy behavior for agentic RL.

ParameterDescription
modeSet to online for external runtime support examples/openclaw/config.yaml35
admin_api_keySecret key for managing sessions and starting episodes examples/openclaw/config.yaml40
tool_call_parserParser for extracting tool calls (e.g., qwen) examples/openclaw/config.yaml36
turn_discountDiscount factor for multi-turn reward propagation examples/openclaw/config.yaml39
export_styleStyle for exporting trajectories (individual or concat) examples/openclaw/config.yaml38

Authentication Tiers

The system uses a two-tier authentication model:

  • Admin API Key: Configured in YAML via rollout.agent.admin_api_key. Used for administrative endpoints like /rl/start_session examples/openclaw/README.md49
  • Session API Key: Issued dynamically by the gateway. Used as a Bearer token for standard OpenAI-compatible endpoints (/chat/completions) and /rl/set_reward examples/openclaw/README.md115-122

Interaction Handling and Trajectories

AReaL ensures that interactions within a session are correctly associated even in complex multi-turn scenarios.

  • Trajectory Collection: RL training requires (input, output, reward) tuples. An episode may contain multiple LLM interactions, which the gateway collects into a single trajectory examples/openclaw/README.md83-85
  • Model Updates: After every training step, the updated model weights are synchronized. The proxy gateway ensures the agent always interacts with the latest policy examples/openclaw/README.md161-163
  • Concurrency: Multiple agent sessions can run concurrently. The unique session API keys allow the gateway to differentiate trajectories and route rewards to the correct sequence examples/openclaw/README.md152-157

Title: Natural Language Space to Code Entity Mapping


Sources: examples/openclaw/README.md83-85 examples/openclaw/config.yaml34-40 examples/openclaw/README.md115-122

Sources:

  • examples/openclaw/README.md
  • examples/openclaw/config.yaml
  • examples/openclaw/start_session.py
  • examples/openclaw/set_reward.py
  • examples/agent_workflow/README.md