VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/6.4-multi-turn-conversations

⇱ Multi-turn Conversations | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Multi-turn Conversations

This page documents how multi-turn conversations are managed in AReaL's OpenAI-compatible client layer, including parent-child relationship tracking, message concatenation, and token-level prefix matching. For information about the overall agentic RL integration system, see 6.1 Agentic RL Overview For details on the InteractionCache data structure, see 6.3 InteractionCache and Session Tracking

Purpose and Scope

The multi-turn conversation system enables training RL agents on dialog tasks by tracking conversation history and efficiently managing token sequences across multiple turns. The system supports two distinct modes:

  1. HF mode: Applies the full chat template on each turn independently.
  2. Concat mode: Builds a conversation tree using prefix matching and token concatenation to ensure alignment.

This document focuses on the technical mechanics of both modes, the token alignment algorithm in areal/experimental/openai/client.py, and how parent-child relationships are established within the InteractionCache in areal/experimental/openai/cache.py.

Sources: areal/experimental/openai/client.py151-212 areal/experimental/openai/cache.py112-181


Chat Template Modes

The ArealOpenAI client supports two chat template modes, specified via the chat_template_type parameter during initialization of AsyncCompletions and AsyncResponses within the client.

Mode Comparison

ModeTemplate ApplicationParent TrackingUse Case
hfFull template per turnOptionalSimple dialogs, standard tokenizers
concatIncremental concatenationRequiredToken-efficient training, tree-structured conversations

The mode is set when creating the client:


Sources: areal/experimental/openai/client.py702-720


HF Chat Template Mode

In HF mode, each interaction applies the tokenizer's chat template to the complete message list independently. This follows standard HuggingFace tokenizer behavior.

Message Processing Flow


Sources: areal/experimental/openai/client.py66-67 areal/experimental/openai/client.py407-414

Token Generation

The tokenization process in HF mode within AsyncCompletions.create:


This produces a fresh token sequence for each turn without reference to previous turns' token IDs.

Sources: areal/experimental/openai/client.py408-414


Concat Chat Template Mode

Concat mode enables token-efficient multi-turn conversations by building a conversation tree and reusing parent tokens. This mode is critical for training because it preserves exact token-level alignment between turns.

Motivation

When training on multi-turn conversations, AReaL ensures that:

  • Prompt tokens from turn $N$ are exactly the same as output tokens from turn $N-1$.
  • No tokens are duplicated or lost between turns (e.g., handling BOS/EOS carefully).
  • The model sees consistent token sequences during training.

Concat mode achieves this by:

  1. Storing parent interaction's input_tokens and output_tokens.
  2. Computing child tokens by finding the overlap with parent tokens using the concat_prompt_token_ids_with_parent function.
  3. Extracting only the new tokens that represent the current turn's prompt.

Sources: areal/experimental/openai/client.py151-212 areal/experimental/openai/client.py415-431


Parent-Child Relationship Tracking

The InteractionCache automatically builds parent-child relationships using prefix matching when interactions are added via __setitem__.

Prefix Matching Algorithm


Sources: areal/experimental/openai/cache.py112-181

Prefix Check Implementation

The prefix check compares message lists element-wise:


For a child to match a parent:

  • parent.messages + parent.output_message_list must be a strict prefix of child.messages.
  • The longest matching prefix becomes the parent.

Sources: areal/experimental/openai/cache.py118-122 areal/experimental/openai/cache.py153-162


Token Concatenation with Parent

The concat_prompt_token_ids_with_parent function is the core of concat mode, performing token-level alignment between parent and child interactions.

High-Level Flow


Sources: areal/experimental/openai/client.py131-144 areal/experimental/openai/client.py151-212

Token Alignment Strategy

The algorithm ensures correct alignment through EOS token counting:

  1. Extract parent tokens: Concatenate parent's input_tokens and output_tokens_without_stop.
  2. Build full message list: Combine parent messages, parent output, and new messages.
  3. Tokenize full conversation: Apply chat template to get all_tokens.
  4. Count parent EOS tokens: Count how many EOS tokens appear in parent_tokens.
  5. Find alignment point: Locate the Nth EOS token in all_tokens where N matches parent EOS count using _find_kth.
  6. Extract new tokens: Take all tokens after the alignment point.
  7. Concatenate: prompt_token_ids = parent_tokens + all_tokens[child_tokens_truncate_idx + 1:].

Sources: areal/experimental/openai/client.py165-211

EOS Token Handling

The system handles EOS tokens carefully to maintain alignment:

CaseParent TerminationEOS Handling
Normal completionNatural EOSRemoved by output_tokens_without_stop, re-added manually in alignment function
Length exceededNo EOSExtra EOS added to align with chat template structure
AbortedNo EOSExtra EOS added to align with chat template structure

The added EOS token is treated as part of the child's prompt, not the parent's output, and is masked during training.

Sources: areal/experimental/openai/client.py178-185


Interaction Data Flow

Complete Multi-Turn Workflow


Sources: areal/experimental/openai/client.py341-592 areal/experimental/openai/cache.py112-181


Multi-turn Workflow Logic

The built-in MultiTurnWorkflow implementation (e.g., areal/experimental/workflow/multi_turn_v2.py) demonstrates iterative reasoning. It uses the ArealOpenAI client to manage a conversation loop where incorrect answers trigger reflection prompts.

Reward Discounting

The InteractionCache supports propagating rewards backward through the conversation tree using a geometric discount factor. This ensures that earlier turns that lead to a successful outcome receive appropriate credit.


Sources: areal/experimental/openai/cache.py55-105 areal/experimental/workflow/multi_turn_v2.py44-96


Tool Call Integration

Multi-turn conversations often involve tool use. The process_tool_calls utility parses model output for tool invocations using backend-specific parsers.

Tool Parsing Logic

  1. Reasoning Detection: Extracts thought process content using _detect_think_and_return_ori_think.
  2. Parser Selection: Maps SGLang-style parser names to vLLM equivalents (e.g., qwen25 to qwen3_xml) via _SGLANG_TO_VLLM_TOOL_PARSER.
  3. Function Parsing: Uses backend-specific FunctionCallParser (SGLang) or ToolParserManager (vLLM) to extract tool names and parameters from the content.
  4. Response Construction: Converts parsed info into ChatCompletionMessageFunctionToolCall or ResponseFunctionToolCall objects.

Sources: areal/experimental/openai/tool_call_parser.py18-31 areal/experimental/openai/tool_call_parser.py61-141 areal/experimental/openai/tool_call_parser.py144-211


Key Classes and Methods

EntityLocationPurpose
concat_prompt_token_ids_with_parentareal/experimental/openai/client.py151-212Core logic for token-level alignment in multi-turn.
InteractionCache.__setitem__areal/experimental/openai/cache.py107-181Logic for automatic parent-child link discovery.
InteractionWithTokenLogpRewardareal/experimental/openai/types.py36-214Data structure storing tokens, logprobs, and parent links.
process_tool_callsareal/experimental/openai/tool_call_parser.py34-141Parses model text into structured tool calls.
_is_prefixareal/experimental/openai/cache.py118-122Message-level prefix matching utility.

Sources: areal/experimental/openai/client.py151-212 areal/experimental/openai/cache.py107-181 areal/experimental/openai/types.py36-214 areal/experimental/openai/tool_call_parser.py34-141