Last indexed: 7 May 2026 (2e12c1)

Agent Workflows

This page provides technical documentation for building agentic RL workflows with AReaL. It demonstrates how to integrate external agentic frameworks (e.g., OpenAI Agents SDK, LangChain, Anthropic, ZeroClaw) with AReaL's reinforcement learning training system using the OpenAI-compatible API and proxy-based patterns.

For the underlying workflow API contract, see RolloutWorkflow API and Implementing Custom Workflows This page focuses on end-to-end examples and integration patterns using OpenAIProxyWorkflow areal/experimental/openai/proxy/workflow.py72

Overview

Agent workflows in AReaL enable training agents that interact with tools, APIs, or complex environments. AReaL supports several execution modes for agents, allowing them to run in the same process, in separate subprocesses, or as part of an online serving architecture areal/experimental/openai/proxy/workflow.py84-112

Key Execution Modes:

Mode	Description	Use Case
`inline`	Agent runs in the same process as the rollout worker via `asyncio`.	Simple agents, low overhead. areal/experimental/openai/proxy/workflow.py123-130
`subproc`	Agent runs in a separate process via `ProcessPoolExecutor`.	CPU-intensive agents or those with conflicting dependencies. areal/experimental/openai/proxy/workflow.py131-145
`online`	Agent runs as an external service waiting for user sessions.	Production-like serving and human-in-the-loop (HITL) training. areal/experimental/openai/proxy/workflow.py146-153

Sources: areal/experimental/openai/proxy/workflow.py84-154 examples/openclaw/README.md195-208

OpenAI Proxy Workflow Architecture

The OpenAIProxyWorkflow acts as a bridge between the RL trainer and the agent logic. It manages session lifecycles, grants capacity to the proxy server via _grant_capacity areal/experimental/openai/proxy/workflow.py156-160 and exports interactions for training.

Agent Execution and Proxy Interaction

Sources: areal/experimental/openai/proxy/workflow.py72-154 areal/experimental/openai/proxy/workflow.py156-184 examples/openclaw/README.md195-208

Implementing Agents for AReaL

Agents are implemented as classes with an asynchronous run method areal/experimental/openai/proxy/workflow.py102-111 AReaL provides a wrapper AsyncRewardWrapper areal/workflow/openai/math_agent.py18 to ensure reward functions are compatible with the async execution environment.

Math Agent Example (Direct OpenAI)

A simple single-turn agent using the standard openai library areal/workflow/openai/math_agent.py37-42

Sources: areal/workflow/openai/math_agent.py27-47 areal/workflow/openai/math_agent.py18

Multi-Turn Agent Example

For multi-turn interactions, the agent maintains state in the messages list and can return a dictionary of rewards keyed by completion ID areal/workflow/openai/math_agent.py65-86

Multi-Turn Logic Flow

Sources: areal/workflow/openai/math_agent.py50-86

Framework Integrations

AReaL's proxy architecture allows it to support virtually any agent framework by injecting the proxy URL and session API key.

1. OpenAI Agents SDK

The OpenAI Agents SDK (referred to as OpenAIRunner) can be used to build multi-agent handoff workflows areal/workflow/openai/math_agent.py143-164 AReaL tracks the entire interaction chain through the proxy.

Sources: areal/workflow/openai/math_agent.py143-164

2. Anthropic Integration

AReaL's proxy can handle Anthropic-style requests by using the anthropic python client pointed at the proxy areal/workflow/anthropic/math_agent.py40-45 It handles the conversion from OpenAI-style messages to Anthropic format areal/workflow/anthropic/math_agent.py48-61

Sources: areal/workflow/anthropic/math_agent.py17-80

3. External Agent Runtimes (ZeroClaw)

For complex agents running outside the Python environment, AReaL provides a ProxyGateway examples/openclaw/README.md55-60 Users start sessions via start_session.py and assign rewards via set_reward.py examples/openclaw/README.md183-185

External Session Lifecycle

Start Session: POST /rl/start_session to get a session_id and api_key examples/openclaw/README.md101-110
Interact: Agent calls proxy using the session key examples/openclaw/README.md134-140
Reward: Set reward for the session examples/openclaw/README.md183-185
Refresh: Calling start_session again with the same key exports the previous trajectory and starts a new one examples/openclaw/README.md197-204

Sources: examples/openclaw/README.md86-208

Tool-Integrated Reasoning (TIR)

Agents can be equipped with tools defined using the @function_tool decorator areal/workflow/openai/math_agent.py89

Tool Name	Implementation	Purpose
`add`	`a + b`	Addition
`subtract`	`a - b`	Subtraction
`multiply`	`a * b`	Multiplication
`divide`	`a / b`	Division (with zero check)
`power`	`a ** b`	Exponentiation
`sqrt`	`a ** 0.5`	Square root

Sources: areal/workflow/openai/math_agent.py89-127

Customer Service Agents (Tau2)

The Tau2 benchmark example demonstrates complex agent training where the agent interacts with a user simulator to resolve domain-specific requests (airline, retail, telecom) examples/tau2/README.md3-9

Tau2 Workflow Components:

Tau2AgentWorkflow: Orchestrates simulations using self-hosted inference servers examples/tau2/README.md13-18
User Simulator: An external LLM (e.g., Qwen2.5-72B) acting as the customer examples/tau2/README.md62-74
Tau2EnvConfig: Configuration for domain, max steps, and penalties examples/tau2/README.md110-123

Tau2 Simulation Flow

Sources: examples/tau2/README.md1-123 examples/tau2/config_8b_airline.yaml122-133

Configuration and Hyperparameters

Agent workflows are configured via the rollout.agent section in the YAML configuration examples/openclaw/config.yaml34-40

Parameter	Default	Description
`mode`	`inline`	Execution mode (`inline`, `subproc`, `online`).
`tool_call_parser`	`qwen`	Parser for extracting tool calls from text.
`reasoning_parser`	`qwen3`	Parser for identifying reasoning blocks.
`export_style`	`individual`	How to export interactions (`individual` or `concat`).
`turn_discount`	`1.0`	Discount factor for multi-turn rewards.
`admin_api_key`	-	Key for administrative tasks (starting sessions).

Sources: examples/openclaw/config.yaml34-40 examples/tau2/config_8b_airline.yaml43-49

Execution Details

Subprocess Management

When mode="subproc", AReaL uses a ProcessPoolExecutor areal/experimental/openai/proxy/workflow.py31 to isolate agent execution. This prevents long-running agent logic from blocking the rollout worker's event loop.

Executor Initialization: _get_executor(max_workers) areal/experimental/openai/proxy/workflow.py36-54
Environment Injection: Proxy address and session keys are passed via environment variables (OPENAI_BASE_URL, OPENAI_API_KEY, etc.) areal/experimental/openai/proxy/workflow.py132-137

Capacity Granting

To prevent unauthorized access or stale requests, the workflow must explicitly grant capacity to the proxy server before an agent session starts areal/experimental/openai/proxy/workflow.py171-178

Function: _grant_capacity(session) areal/experimental/openai/proxy/workflow.py156-160
Endpoint: GRANT_CAPACITY_PATHNAME areal/experimental/openai/proxy/workflow.py20

Sources: areal/experimental/openai/proxy/workflow.py156-178 areal/experimental/openai/proxy/workflow.py20

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/14.4-agent-workflows

⇱ Agent Workflows | inclusionAI/AReaL | DeepWiki