Last indexed: 7 May 2026 (2e12c1)

Multi-turn Conversations

This page documents how multi-turn conversations are managed in AReaL's OpenAI-compatible client layer, including parent-child relationship tracking, message concatenation, and token-level prefix matching. For information about the overall agentic RL integration system, see 6.1 Agentic RL Overview For details on the InteractionCache data structure, see 6.3 InteractionCache and Session Tracking

Purpose and Scope

The multi-turn conversation system enables training RL agents on dialog tasks by tracking conversation history and efficiently managing token sequences across multiple turns. The system supports two distinct modes:

HF mode: Applies the full chat template on each turn independently.
Concat mode: Builds a conversation tree using prefix matching and token concatenation to ensure alignment.

This document focuses on the technical mechanics of both modes, the token alignment algorithm in areal/experimental/openai/client.py, and how parent-child relationships are established within the InteractionCache in areal/experimental/openai/cache.py.

Sources: areal/experimental/openai/client.py151-212 areal/experimental/openai/cache.py112-181

Chat Template Modes

The ArealOpenAI client supports two chat template modes, specified via the chat_template_type parameter during initialization of AsyncCompletions and AsyncResponses within the client.

Mode Comparison

Mode	Template Application	Parent Tracking	Use Case
`hf`	Full template per turn	Optional	Simple dialogs, standard tokenizers
`concat`	Incremental concatenation	Required	Token-efficient training, tree-structured conversations

The mode is set when creating the client:

Sources: areal/experimental/openai/client.py702-720

HF Chat Template Mode

In HF mode, each interaction applies the tokenizer's chat template to the complete message list independently. This follows standard HuggingFace tokenizer behavior.

Message Processing Flow

Sources: areal/experimental/openai/client.py66-67 areal/experimental/openai/client.py407-414

Token Generation

The tokenization process in HF mode within AsyncCompletions.create:

This produces a fresh token sequence for each turn without reference to previous turns' token IDs.

Sources: areal/experimental/openai/client.py408-414

Concat Chat Template Mode

Concat mode enables token-efficient multi-turn conversations by building a conversation tree and reusing parent tokens. This mode is critical for training because it preserves exact token-level alignment between turns.

Motivation

When training on multi-turn conversations, AReaL ensures that:

Prompt tokens from turn $N$ are exactly the same as output tokens from turn $N-1$.
No tokens are duplicated or lost between turns (e.g., handling BOS/EOS carefully).
The model sees consistent token sequences during training.

Concat mode achieves this by:

Storing parent interaction's input_tokens and output_tokens.
Computing child tokens by finding the overlap with parent tokens using the concat_prompt_token_ids_with_parent function.
Extracting only the new tokens that represent the current turn's prompt.

Sources: areal/experimental/openai/client.py151-212 areal/experimental/openai/client.py415-431

Parent-Child Relationship Tracking

The InteractionCache automatically builds parent-child relationships using prefix matching when interactions are added via __setitem__.

Prefix Matching Algorithm

Sources: areal/experimental/openai/cache.py112-181

Prefix Check Implementation

The prefix check compares message lists element-wise:

For a child to match a parent:

parent.messages + parent.output_message_list must be a strict prefix of child.messages.
The longest matching prefix becomes the parent.

Sources: areal/experimental/openai/cache.py118-122 areal/experimental/openai/cache.py153-162

Token Concatenation with Parent

The concat_prompt_token_ids_with_parent function is the core of concat mode, performing token-level alignment between parent and child interactions.

High-Level Flow

Sources: areal/experimental/openai/client.py131-144 areal/experimental/openai/client.py151-212

Token Alignment Strategy

The algorithm ensures correct alignment through EOS token counting:

Extract parent tokens: Concatenate parent's input_tokens and output_tokens_without_stop.
Build full message list: Combine parent messages, parent output, and new messages.
Tokenize full conversation: Apply chat template to get all_tokens.
Count parent EOS tokens: Count how many EOS tokens appear in parent_tokens.
Find alignment point: Locate the Nth EOS token in all_tokens where N matches parent EOS count using _find_kth.
Extract new tokens: Take all tokens after the alignment point.
Concatenate: prompt_token_ids = parent_tokens + all_tokens[child_tokens_truncate_idx + 1:].

Sources: areal/experimental/openai/client.py165-211

EOS Token Handling

The system handles EOS tokens carefully to maintain alignment:

Case	Parent Termination	EOS Handling
Normal completion	Natural EOS	Removed by `output_tokens_without_stop`, re-added manually in alignment function
Length exceeded	No EOS	Extra EOS added to align with chat template structure
Aborted	No EOS	Extra EOS added to align with chat template structure

The added EOS token is treated as part of the child's prompt, not the parent's output, and is masked during training.

Sources: areal/experimental/openai/client.py178-185

Interaction Data Flow

Complete Multi-Turn Workflow

Sources: areal/experimental/openai/client.py341-592 areal/experimental/openai/cache.py112-181

Multi-turn Workflow Logic

The built-in MultiTurnWorkflow implementation (e.g., areal/experimental/workflow/multi_turn_v2.py) demonstrates iterative reasoning. It uses the ArealOpenAI client to manage a conversation loop where incorrect answers trigger reflection prompts.

Reward Discounting

The InteractionCache supports propagating rewards backward through the conversation tree using a geometric discount factor. This ensures that earlier turns that lead to a successful outcome receive appropriate credit.

Sources: areal/experimental/openai/cache.py55-105 areal/experimental/workflow/multi_turn_v2.py44-96

Tool Call Integration

Multi-turn conversations often involve tool use. The process_tool_calls utility parses model output for tool invocations using backend-specific parsers.

Tool Parsing Logic

Reasoning Detection: Extracts thought process content using _detect_think_and_return_ori_think.
Parser Selection: Maps SGLang-style parser names to vLLM equivalents (e.g., qwen25 to qwen3_xml) via _SGLANG_TO_VLLM_TOOL_PARSER.
Function Parsing: Uses backend-specific FunctionCallParser (SGLang) or ToolParserManager (vLLM) to extract tool names and parameters from the content.
Response Construction: Converts parsed info into ChatCompletionMessageFunctionToolCall or ResponseFunctionToolCall objects.

Sources: areal/experimental/openai/tool_call_parser.py18-31 areal/experimental/openai/tool_call_parser.py61-141 areal/experimental/openai/tool_call_parser.py144-211

Key Classes and Methods

Entity	Location	Purpose
`concat_prompt_token_ids_with_parent`	areal/experimental/openai/client.py151-212	Core logic for token-level alignment in multi-turn.
`InteractionCache.__setitem__`	areal/experimental/openai/cache.py107-181	Logic for automatic parent-child link discovery.
`InteractionWithTokenLogpReward`	areal/experimental/openai/types.py36-214	Data structure storing tokens, logprobs, and parent links.
`process_tool_calls`	areal/experimental/openai/tool_call_parser.py34-141	Parses model text into structured tool calls.
`_is_prefix`	areal/experimental/openai/cache.py118-122	Message-level prefix matching utility.

Sources: areal/experimental/openai/client.py151-212 areal/experimental/openai/cache.py107-181 areal/experimental/openai/types.py36-214 areal/experimental/openai/tool_call_parser.py34-141

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/6.4-multi-turn-conversations