VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/6.2-arealopenai-client

⇱ ArealOpenAI Client | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

ArealOpenAI Client

This document describes the ArealOpenAI client, which provides an OpenAI-compatible interface to AReaL's inference engines with integrated reward tracking and conversation management for agentic RL. The client wraps inference engines to expose both the Chat Completions API and the Responses API (OpenAI Agents SDK), automatically caching interactions with token-level metadata and supporting backward reward propagation.

For the broader context of agentic RL integration patterns, see Agentic RL Overview. For details on the underlying session tracking and conversation tree structures, see InteractionCache and Session Tracking. For reward assignment methods, see Reward Assignment and Discounting.


Purpose and Scope

The ArealOpenAI client serves as the primary interface for integrating AReaL with agent frameworks that use the OpenAI API standard. It enables:


Core Architecture

Class Hierarchy

The following diagram maps the Natural Language concepts of the OpenAI API to the specific Code Entities in AReaL.


Sources: areal/experimental/openai/client.py213-332 areal/experimental/openai/cache.py13-18 areal/experimental/openai/client.py65-67


Client Initialization

The ArealOpenAI client is initialized with an inference engine, tokenizer, and configuration parameters areal/experimental/openai/client.py213-238


Initialization Parameters

ParameterTypeDescription
engine_AsyncGenerateEngineInference engine implementing agenerate(ModelRequest) -> ModelResponse areal/experimental/openai/client.py65-67
tokenizerPreTrainedTokenizerFastTokenizer for applying chat templates and encoding/decoding areal/experimental/openai/client.py216
tool_call_parserstrParser type for extracting tool calls (e.g., "qwen25") areal/experimental/openai/client.py220
reasoning_parserstrParser type for extracting reasoning tokens (e.g., "qwen3") areal/experimental/openai/client.py221
chat_template_typestrTemplate mode: "hf" (standard) or "concat" (tree building) areal/experimental/openai/client.py222
engine_max_tokens`intNone`

Sources: areal/experimental/openai/client.py213-238


Request Flow

Chat Completions API Flow

The diagram below illustrates how a standard ChatCompletion request flows through the client and interacts with AReaL's internal entities.


Sources: areal/experimental/openai/client.py335-586 areal/experimental/openai/cache.py107-171


Chat Completions API

The AsyncCompletionsWithReward class extends OpenAI's AsyncCompletions to provide reward tracking areal/experimental/openai/client.py335

Method Signature


Token Limit Handling

The client applies multiple token limits in priority order to compute the final max_new_tokens sent to the InferenceEngine areal/experimental/openai/client.py446-481

  1. max_total_tokens: Hard limit on total tokens (prompt + completion) areal/experimental/openai/client.py446
  2. engine_max_tokens: Engine-level limit set during initialization areal/experimental/openai/client.py447
  3. max_completion_tokens: Limit on generated tokens only areal/experimental/openai/client.py448

The effective max_new_tokens is computed by taking the minimum of available space in total/engine limits and the explicit completion token limit areal/experimental/openai/client.py470-481

Sources: areal/experimental/openai/client.py446-481


Responses API

The AsyncResponsesWithReward class extends OpenAI's AsyncResponses to support the Agents SDK format areal/experimental/openai/client.py716

Input Format Conversion

The Responses API accepts flexible input formats that are normalized to message lists using _ensure_message_dict_list areal/experimental/openai/client.py79-127


Sources: areal/experimental/openai/client.py760-833


Interaction Caching and Tracking

InteractionCache

The InteractionCache is an OrderedDict that automatically manages parent-child relationships between interactions areal/experimental/openai/cache.py13


Sources: areal/experimental/openai/cache.py107-171

Parent-Child Relationship Building

When a new interaction is added to the cache, the cache automatically searches for its parent by checking if parent.messages + parent.output_message_list is a strict prefix of new.messages areal/experimental/openai/cache.py159-162 This allows AReaL to reconstruct the conversation tree for multi-turn RL.


Chat Template Modes

Concat Mode (chat_template_type="concat")

Advanced mode that concatenates parent's tokens with child's new tokens to maintain exact token alignment areal/experimental/openai/client.py143-210

The concat_prompt_token_ids_with_parent function:

  1. Takes parent's full token sequence (input + output) areal/experimental/openai/client.py166-169
  2. Applies chat template to full conversation (parent + child messages) areal/experimental/openai/client.py192-197
  3. Finds split point by matching token IDs to extract only new tokens for the child areal/experimental/openai/client.py199-209

Sources: areal/experimental/openai/client.py143-210


Tool Call Support

The client automatically parses and structures tool calls from model outputs using process_tool_calls areal/experimental/openai/tool_call_parser.py55

Supported Parsers

The _SGLANG_TO_VLLM_TOOL_PARSER mapping ensures compatibility between SGLang and vLLM parser names areal/experimental/openai/tool_call_parser.py18-31

ParserFormatExample
qwen25<tool_call>\n{json}\n</tool_call>For Qwen 2.5 models areal/experimental/openai/tool_call_parser.py55
qwen3<thought> for reasoningFor Qwen 3 models areal/experimental/openai/tool_call_parser.py55

Sources: areal/experimental/openai/client.py525-540 areal/experimental/openai/tool_call_parser.py18-55


Reward Management

Backward Reward Propagation

The apply_reward_discount() method propagates rewards backward through conversation history using geometric discounting areal/experimental/openai/cache.py55-84

Algorithm:


Sources: areal/experimental/openai/cache.py55-105


Exporting Interactions for Training

The client exports cached interactions in different styles for training scenarios areal/experimental/openai/cache.py173-261

StyleDescriptionUse Case
"individual"Returns all cached interactions as-is areal/experimental/openai/cache.py214Standard RL training (PPO/GRPO)
"concat"Returns only leaf nodes with full conversation sequences areal/experimental/openai/cache.py218Tree-based RL or multi-turn trajectories

The to_tensor_dict() method in InteractionWithTokenLogpReward handles the heavy lifting of constructing logprobs, loss masks, and version tensors for training, including logic to mask out parent tokens in concat mode areal/experimental/openai/types.py143-201

Sources: areal/experimental/openai/cache.py173-261 areal/experimental/openai/types.py143-201


Proxy Server Architecture

For scenarios where agent runtimes are external to the AReaL training process, AReaL provides a proxy server that implements the OpenAI protocol and manages sessions via SessionData areal/experimental/openai/proxy/server.py66-123

Session Management

SessionData wraps an InteractionCache and tracks the lifecycle of a single RL episode areal/experimental/openai/proxy/server.py73 It provides methods to update access time, check for timeouts, and export trajectories once the session is marked as finished areal/experimental/openai/proxy/server.py80-123


Sources: areal/experimental/openai/proxy/server.py66-123 areal/experimental/openai/proxy/server.py179-188