VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/14.4-agent-workflows

⇱ Agent Workflows | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Agent Workflows

This page provides technical documentation for building agentic RL workflows with AReaL. It demonstrates how to integrate external agentic frameworks (e.g., OpenAI Agents SDK, LangChain, Anthropic, ZeroClaw) with AReaL's reinforcement learning training system using the OpenAI-compatible API and proxy-based patterns.

For the underlying workflow API contract, see RolloutWorkflow API and Implementing Custom Workflows This page focuses on end-to-end examples and integration patterns using OpenAIProxyWorkflow areal/experimental/openai/proxy/workflow.py72

Overview

Agent workflows in AReaL enable training agents that interact with tools, APIs, or complex environments. AReaL supports several execution modes for agents, allowing them to run in the same process, in separate subprocesses, or as part of an online serving architecture areal/experimental/openai/proxy/workflow.py84-112

Key Execution Modes:

ModeDescriptionUse Case
inlineAgent runs in the same process as the rollout worker via asyncio.Simple agents, low overhead. areal/experimental/openai/proxy/workflow.py123-130
subprocAgent runs in a separate process via ProcessPoolExecutor.CPU-intensive agents or those with conflicting dependencies. areal/experimental/openai/proxy/workflow.py131-145
onlineAgent runs as an external service waiting for user sessions.Production-like serving and human-in-the-loop (HITL) training. areal/experimental/openai/proxy/workflow.py146-153

Sources: areal/experimental/openai/proxy/workflow.py84-154 examples/openclaw/README.md195-208

OpenAI Proxy Workflow Architecture

The OpenAIProxyWorkflow acts as a bridge between the RL trainer and the agent logic. It manages session lifecycles, grants capacity to the proxy server via _grant_capacity areal/experimental/openai/proxy/workflow.py156-160 and exports interactions for training.

Agent Execution and Proxy Interaction


Sources: areal/experimental/openai/proxy/workflow.py72-154 areal/experimental/openai/proxy/workflow.py156-184 examples/openclaw/README.md195-208

Implementing Agents for AReaL

Agents are implemented as classes with an asynchronous run method areal/experimental/openai/proxy/workflow.py102-111 AReaL provides a wrapper AsyncRewardWrapper areal/workflow/openai/math_agent.py18 to ensure reward functions are compatible with the async execution environment.

Math Agent Example (Direct OpenAI)

A simple single-turn agent using the standard openai library areal/workflow/openai/math_agent.py37-42


Sources: areal/workflow/openai/math_agent.py27-47 areal/workflow/openai/math_agent.py18

Multi-Turn Agent Example

For multi-turn interactions, the agent maintains state in the messages list and can return a dictionary of rewards keyed by completion ID areal/workflow/openai/math_agent.py65-86

Multi-Turn Logic Flow


Sources: areal/workflow/openai/math_agent.py50-86

Framework Integrations

AReaL's proxy architecture allows it to support virtually any agent framework by injecting the proxy URL and session API key.

1. OpenAI Agents SDK

The OpenAI Agents SDK (referred to as OpenAIRunner) can be used to build multi-agent handoff workflows areal/workflow/openai/math_agent.py143-164 AReaL tracks the entire interaction chain through the proxy.


Sources: areal/workflow/openai/math_agent.py143-164

2. Anthropic Integration

AReaL's proxy can handle Anthropic-style requests by using the anthropic python client pointed at the proxy areal/workflow/anthropic/math_agent.py40-45 It handles the conversion from OpenAI-style messages to Anthropic format areal/workflow/anthropic/math_agent.py48-61

Sources: areal/workflow/anthropic/math_agent.py17-80

3. External Agent Runtimes (ZeroClaw)

For complex agents running outside the Python environment, AReaL provides a ProxyGateway examples/openclaw/README.md55-60 Users start sessions via start_session.py and assign rewards via set_reward.py examples/openclaw/README.md183-185

External Session Lifecycle

  1. Start Session: POST /rl/start_session to get a session_id and api_key examples/openclaw/README.md101-110
  2. Interact: Agent calls proxy using the session key examples/openclaw/README.md134-140
  3. Reward: Set reward for the session examples/openclaw/README.md183-185
  4. Refresh: Calling start_session again with the same key exports the previous trajectory and starts a new one examples/openclaw/README.md197-204

Sources: examples/openclaw/README.md86-208

Tool-Integrated Reasoning (TIR)

Agents can be equipped with tools defined using the @function_tool decorator areal/workflow/openai/math_agent.py89

Tool NameImplementationPurpose
adda + bAddition
subtracta - bSubtraction
multiplya * bMultiplication
dividea / bDivision (with zero check)
powera ** bExponentiation
sqrta ** 0.5Square root

Sources: areal/workflow/openai/math_agent.py89-127

Customer Service Agents (Tau2)

The Tau2 benchmark example demonstrates complex agent training where the agent interacts with a user simulator to resolve domain-specific requests (airline, retail, telecom) examples/tau2/README.md3-9

Tau2 Workflow Components:

Tau2 Simulation Flow


Sources: examples/tau2/README.md1-123 examples/tau2/config_8b_airline.yaml122-133

Configuration and Hyperparameters

Agent workflows are configured via the rollout.agent section in the YAML configuration examples/openclaw/config.yaml34-40

ParameterDefaultDescription
modeinlineExecution mode (inline, subproc, online).
tool_call_parserqwenParser for extracting tool calls from text.
reasoning_parserqwen3Parser for identifying reasoning blocks.
export_styleindividualHow to export interactions (individual or concat).
turn_discount1.0Discount factor for multi-turn rewards.
admin_api_key-Key for administrative tasks (starting sessions).

Sources: examples/openclaw/config.yaml34-40 examples/tau2/config_8b_airline.yaml43-49

Execution Details

Subprocess Management

When mode="subproc", AReaL uses a ProcessPoolExecutor areal/experimental/openai/proxy/workflow.py31 to isolate agent execution. This prevents long-running agent logic from blocking the rollout worker's event loop.

Capacity Granting

To prevent unauthorized access or stale requests, the workflow must explicitly grant capacity to the proxy server before an agent session starts areal/experimental/openai/proxy/workflow.py171-178

Sources: areal/experimental/openai/proxy/workflow.py156-178 areal/experimental/openai/proxy/workflow.py20