Last indexed: 7 May 2026 (2e12c1)

ArealOpenAI Client

This document describes the ArealOpenAI client, which provides an OpenAI-compatible interface to AReaL's inference engines with integrated reward tracking and conversation management for agentic RL. The client wraps inference engines to expose both the Chat Completions API and the Responses API (OpenAI Agents SDK), automatically caching interactions with token-level metadata and supporting backward reward propagation.

For the broader context of agentic RL integration patterns, see Agentic RL Overview. For details on the underlying session tracking and conversation tree structures, see InteractionCache and Session Tracking. For reward assignment methods, see Reward Assignment and Discounting.

Purpose and Scope

The ArealOpenAI client serves as the primary interface for integrating AReaL with agent frameworks that use the OpenAI API standard. It enables:

OpenAI API Compatibility: Exposes chat.completions.create() and responses.create() methods compatible with standard OpenAI SDKs areal/experimental/openai/client.py213-332
Automatic Interaction Tracking: Caches all generations with token IDs, log probabilities, and metadata areal/experimental/openai/client.py568-586
Reward Management: Associates scalar rewards with interactions and propagates them backward through conversation history areal/experimental/openai/cache.py55-84
Conversation Tree Building: Automatically constructs parent-child relationships between multi-turn interactions using prefix matching areal/experimental/openai/cache.py112-171
Training Data Export: Generates properly formatted training data with loss masks for RL algorithms areal/experimental/openai/types.py143-201

Core Architecture

Class Hierarchy

The following diagram maps the Natural Language concepts of the OpenAI API to the specific Code Entities in AReaL.

Sources: areal/experimental/openai/client.py213-332 areal/experimental/openai/cache.py13-18 areal/experimental/openai/client.py65-67

Client Initialization

The ArealOpenAI client is initialized with an inference engine, tokenizer, and configuration parameters areal/experimental/openai/client.py213-238

Initialization Parameters

Parameter	Type	Description
`engine`	`_AsyncGenerateEngine`	Inference engine implementing `agenerate(ModelRequest) -> ModelResponse` areal/experimental/openai/client.py65-67
`tokenizer`	`PreTrainedTokenizerFast`	Tokenizer for applying chat templates and encoding/decoding areal/experimental/openai/client.py216
`tool_call_parser`	`str`	Parser type for extracting tool calls (e.g., "qwen25") areal/experimental/openai/client.py220
`reasoning_parser`	`str`	Parser type for extracting reasoning tokens (e.g., "qwen3") areal/experimental/openai/client.py221
`chat_template_type`	`str`	Template mode: `"hf"` (standard) or `"concat"` (tree building) areal/experimental/openai/client.py222
`engine_max_tokens`	`int	None`

Sources: areal/experimental/openai/client.py213-238

Request Flow

Chat Completions API Flow

The diagram below illustrates how a standard ChatCompletion request flows through the client and interacts with AReaL's internal entities.

Sources: areal/experimental/openai/client.py335-586 areal/experimental/openai/cache.py107-171

Chat Completions API

The AsyncCompletionsWithReward class extends OpenAI's AsyncCompletions to provide reward tracking areal/experimental/openai/client.py335

Method Signature

Token Limit Handling

The client applies multiple token limits in priority order to compute the final max_new_tokens sent to the InferenceEngine areal/experimental/openai/client.py446-481

max_total_tokens: Hard limit on total tokens (prompt + completion) areal/experimental/openai/client.py446
engine_max_tokens: Engine-level limit set during initialization areal/experimental/openai/client.py447
max_completion_tokens: Limit on generated tokens only areal/experimental/openai/client.py448

The effective max_new_tokens is computed by taking the minimum of available space in total/engine limits and the explicit completion token limit areal/experimental/openai/client.py470-481

Sources: areal/experimental/openai/client.py446-481

Responses API

The AsyncResponsesWithReward class extends OpenAI's AsyncResponses to support the Agents SDK format areal/experimental/openai/client.py716

Input Format Conversion

The Responses API accepts flexible input formats that are normalized to message lists using _ensure_message_dict_list areal/experimental/openai/client.py79-127

Sources: areal/experimental/openai/client.py760-833

Interaction Caching and Tracking

InteractionCache

The InteractionCache is an OrderedDict that automatically manages parent-child relationships between interactions areal/experimental/openai/cache.py13

Sources: areal/experimental/openai/cache.py107-171

Parent-Child Relationship Building

When a new interaction is added to the cache, the cache automatically searches for its parent by checking if parent.messages + parent.output_message_list is a strict prefix of new.messages areal/experimental/openai/cache.py159-162 This allows AReaL to reconstruct the conversation tree for multi-turn RL.

Chat Template Modes

Concat Mode (`chat_template_type="concat"`)

Advanced mode that concatenates parent's tokens with child's new tokens to maintain exact token alignment areal/experimental/openai/client.py143-210

The concat_prompt_token_ids_with_parent function:

Takes parent's full token sequence (input + output) areal/experimental/openai/client.py166-169
Applies chat template to full conversation (parent + child messages) areal/experimental/openai/client.py192-197
Finds split point by matching token IDs to extract only new tokens for the child areal/experimental/openai/client.py199-209

Sources: areal/experimental/openai/client.py143-210

Tool Call Support

The client automatically parses and structures tool calls from model outputs using process_tool_calls areal/experimental/openai/tool_call_parser.py55

Supported Parsers

The _SGLANG_TO_VLLM_TOOL_PARSER mapping ensures compatibility between SGLang and vLLM parser names areal/experimental/openai/tool_call_parser.py18-31

Parser	Format	Example
`qwen25`	`<tool_call>\n{json}\n</tool_call>`	For Qwen 2.5 models areal/experimental/openai/tool_call_parser.py55
`qwen3`	`<thought>` for reasoning	For Qwen 3 models areal/experimental/openai/tool_call_parser.py55

Sources: areal/experimental/openai/client.py525-540 areal/experimental/openai/tool_call_parser.py18-55

Reward Management

Backward Reward Propagation

The apply_reward_discount() method propagates rewards backward through conversation history using geometric discounting areal/experimental/openai/cache.py55-84

Algorithm:

Sources: areal/experimental/openai/cache.py55-105

Exporting Interactions for Training

The client exports cached interactions in different styles for training scenarios areal/experimental/openai/cache.py173-261

Style	Description	Use Case
`"individual"`	Returns all cached interactions as-is areal/experimental/openai/cache.py214	Standard RL training (PPO/GRPO)
`"concat"`	Returns only leaf nodes with full conversation sequences areal/experimental/openai/cache.py218	Tree-based RL or multi-turn trajectories

The to_tensor_dict() method in InteractionWithTokenLogpReward handles the heavy lifting of constructing logprobs, loss masks, and version tensors for training, including logic to mask out parent tokens in concat mode areal/experimental/openai/types.py143-201

Sources: areal/experimental/openai/cache.py173-261 areal/experimental/openai/types.py143-201

Proxy Server Architecture

For scenarios where agent runtimes are external to the AReaL training process, AReaL provides a proxy server that implements the OpenAI protocol and manages sessions via SessionData areal/experimental/openai/proxy/server.py66-123

Session Management

SessionData wraps an InteractionCache and tracks the lifecycle of a single RL episode areal/experimental/openai/proxy/server.py73 It provides methods to update access time, check for timeouts, and export trajectories once the session is marked as finished areal/experimental/openai/proxy/server.py80-123

Sources: areal/experimental/openai/proxy/server.py66-123 areal/experimental/openai/proxy/server.py179-188

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/6.2-arealopenai-client