VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/6.3-interactioncache-and-session-tracking

⇱ InteractionCache and Session Tracking | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

InteractionCache and Session Tracking

This page documents the InteractionCache class and session tracking mechanisms used in AReaL's agentic RL integration. The cache stores individual LLM interactions (completions/responses) along with their metadata, rewards, and parent-child relationships to enable multi-turn conversation tracking and reward backpropagation.

For information about the ArealOpenAI client that uses this cache, see ArealOpenAI Client. For details on reward assignment and discounting algorithms, see Reward Assignment and Discounting. For information on how interactions are exported to training format, see Interaction Export.


Overview

The InteractionCache is a specialized dictionary that:

  1. Stores interactions keyed by completion/call ID. areal/experimental/openai/cache.py13-15
  2. Automatically constructs conversation trees by detecting prefix relationships between message histories during insertion. areal/experimental/openai/cache.py112-160
  3. Manages rewards with support for per-interaction assignment and backward discounting. areal/experimental/openai/cache.py44-84
  4. Exports interactions in different formats (individual or concatenated) for training. areal/experimental/openai/cache.py186-250

Sources: areal/experimental/openai/cache.py1-250


InteractionCache Class Structure

Code Entity Space Diagram

This diagram shows the relationship between the core caching classes and the underlying data structures used for inference results.


Key Components:

ComponentTypePurpose
InteractionCacheOrderedDict subclassMain storage container with ordering preserved for reward propagation. areal/experimental/openai/cache.py13-15
InteractionWithTokenLogpRewardData classStores single interaction with messages, response, and reward. areal/experimental/openai/types.py36-58
_apply_reward_discount_calledboolFlag to prevent multiple reward discount applications. areal/experimental/openai/cache.py16
_total_rewardfloatCumulative reward across all cached interactions. areal/experimental/openai/cache.py17
_lockthreading.LockThread safety for concurrent reward updates. areal/experimental/openai/cache.py18

Sources: areal/experimental/openai/cache.py13-18 areal/experimental/openai/types.py35-58


Session Tracking and Parent-Child Relationships

Automatic Parent Detection

When a new interaction is added to the cache via __setitem__, the cache automatically detects parent-child relationships using a longest prefix matching algorithm. This is critical for reconstructing conversation trees from stateless API calls.


Prefix Matching Logic:

The algorithm uses strict message-level prefix matching. For an interaction B to be considered a child of interaction A, the combined input (messages) and output (output_message_list) of A must match the start of B's message history. The cache iterates through potential parents sorted by message length to find the longest match. areal/experimental/openai/cache.py148-162


If a strict prefix match isn't found, but the base messages match, _is_similar_on_last_message is used to detect potential formatting mismatches (e.g., missing keys in the message dictionary) and logs a warning. areal/experimental/openai/cache.py163-183

Sources: areal/experimental/openai/cache.py107-184


Example: Conversation Tree Construction

The following diagram bridges the Natural Language Space (user prompts and assistant responses) to the Code Entity Space (cached InteractionWithTokenLogpReward objects).


Sources: areal/experimental/openai/cache.py13-25 areal/experimental/openai/cache.py148-162


Reward Management

Setting Rewards

The cache provides thread-safe methods for setting rewards, allowing for asynchronous reward assignment from external processes or reward functions.

MethodParametersPurpose
set_reward(interaction_id, reward)interaction_id: str, reward: floatSet reward for specific interaction by ID. areal/experimental/openai/cache.py44-49
set_last_reward(reward)reward: floatSet reward for most recently added interaction. areal/experimental/openai/cache.py51-53

Sources: areal/experimental/openai/cache.py44-53

Reward Discounting

The apply_reward_discount() method propagates rewards backward through the conversation history using geometric discounting. This is typically called before exporting for RL training to ensure early actions receive credit for later success. areal/experimental/openai/cache.py55-105

Algorithm Logic:

  1. Iterates through the OrderedDict in reverse (newest interactions first). areal/experimental/openai/cache.py89-93
  2. If an interaction has no reward, it defaults to 0.0. areal/experimental/openai/cache.py94-101
  3. Propagates reward: current_reward = current_reward * turn_discount + interaction.reward. areal/experimental/openai/cache.py103-104

Sources: areal/experimental/openai/cache.py55-105


Proxy Server Session Tracking

In the experimental proxy architecture, sessions are managed by SessionData, which wraps an InteractionCache. areal/experimental/openai/proxy/server.py66-73

Session Lifecycle:

Sources: areal/experimental/openai/proxy/server.py18-113


Export Modes

The export_interactions() method filters the cache based on the desired training style. areal/experimental/openai/cache.py186-250

Comparison Table

Featurestyle="individual"style="concat"
ReturnsAll interactions in the cache.Only leaf nodes (no children).
Parent linksPreserved for metadata.Used to identify leaves.
Use caseStandard turn-by-turn training.Trajectory-based training (e.g., GRPO).
ConstraintNone.All interactions must use "concat" template.

Sources: areal/experimental/openai/cache.py186-250


Incomplete Interaction Handling

The cache handles concurrent or failed requests by validating that an interaction is complete before processing it as a parent or exporting it.


If export_interactions() is called while some requests are still in flight, those requests are logged as warnings and excluded from the exported dictionary to prevent training on partial data. areal/experimental/openai/cache.py221-240

Sources: areal/experimental/openai/cache.py157-158 areal/experimental/openai/cache.py221-240


Thread Safety

The InteractionCache uses a threading.Lock to ensure that reward updates are atomic, which is essential when multiple reward models or human annotators are submitting scores concurrently. areal/experimental/openai/cache.py18 areal/experimental/openai/cache.py46-49


Sources: areal/experimental/openai/cache.py18 areal/experimental/openai/cache.py46-49