Last indexed: 7 May 2026 (2e12c1)

InteractionCache and Session Tracking

This page documents the InteractionCache class and session tracking mechanisms used in AReaL's agentic RL integration. The cache stores individual LLM interactions (completions/responses) along with their metadata, rewards, and parent-child relationships to enable multi-turn conversation tracking and reward backpropagation.

For information about the ArealOpenAI client that uses this cache, see ArealOpenAI Client. For details on reward assignment and discounting algorithms, see Reward Assignment and Discounting. For information on how interactions are exported to training format, see Interaction Export.

Overview

The InteractionCache is a specialized dictionary that:

Stores interactions keyed by completion/call ID. areal/experimental/openai/cache.py13-15
Automatically constructs conversation trees by detecting prefix relationships between message histories during insertion. areal/experimental/openai/cache.py112-160
Manages rewards with support for per-interaction assignment and backward discounting. areal/experimental/openai/cache.py44-84
Exports interactions in different formats (individual or concatenated) for training. areal/experimental/openai/cache.py186-250

Sources: areal/experimental/openai/cache.py1-250

InteractionCache Class Structure

Code Entity Space Diagram

This diagram shows the relationship between the core caching classes and the underlying data structures used for inference results.

Key Components:

Component	Type	Purpose
`InteractionCache`	`OrderedDict` subclass	Main storage container with ordering preserved for reward propagation. areal/experimental/openai/cache.py13-15
`InteractionWithTokenLogpReward`	Data class	Stores single interaction with messages, response, and reward. areal/experimental/openai/types.py36-58
`_apply_reward_discount_called`	`bool`	Flag to prevent multiple reward discount applications. areal/experimental/openai/cache.py16
`_total_reward`	`float`	Cumulative reward across all cached interactions. areal/experimental/openai/cache.py17
`_lock`	`threading.Lock`	Thread safety for concurrent reward updates. areal/experimental/openai/cache.py18

Sources: areal/experimental/openai/cache.py13-18 areal/experimental/openai/types.py35-58

Session Tracking and Parent-Child Relationships

Automatic Parent Detection

When a new interaction is added to the cache via __setitem__, the cache automatically detects parent-child relationships using a longest prefix matching algorithm. This is critical for reconstructing conversation trees from stateless API calls.

Prefix Matching Logic:

The algorithm uses strict message-level prefix matching. For an interaction B to be considered a child of interaction A, the combined input (messages) and output (output_message_list) of A must match the start of B's message history. The cache iterates through potential parents sorted by message length to find the longest match. areal/experimental/openai/cache.py148-162

If a strict prefix match isn't found, but the base messages match, _is_similar_on_last_message is used to detect potential formatting mismatches (e.g., missing keys in the message dictionary) and logs a warning. areal/experimental/openai/cache.py163-183

Sources: areal/experimental/openai/cache.py107-184

Example: Conversation Tree Construction

The following diagram bridges the Natural Language Space (user prompts and assistant responses) to the Code Entity Space (cached InteractionWithTokenLogpReward objects).

Sources: areal/experimental/openai/cache.py13-25 areal/experimental/openai/cache.py148-162

Reward Management

Setting Rewards

The cache provides thread-safe methods for setting rewards, allowing for asynchronous reward assignment from external processes or reward functions.

Method	Parameters	Purpose
`set_reward(interaction_id, reward)`	`interaction_id: str`, `reward: float`	Set reward for specific interaction by ID. areal/experimental/openai/cache.py44-49
`set_last_reward(reward)`	`reward: float`	Set reward for most recently added interaction. areal/experimental/openai/cache.py51-53

Sources: areal/experimental/openai/cache.py44-53

Reward Discounting

The apply_reward_discount() method propagates rewards backward through the conversation history using geometric discounting. This is typically called before exporting for RL training to ensure early actions receive credit for later success. areal/experimental/openai/cache.py55-105

Algorithm Logic:

Iterates through the OrderedDict in reverse (newest interactions first). areal/experimental/openai/cache.py89-93
If an interaction has no reward, it defaults to 0.0. areal/experimental/openai/cache.py94-101
Propagates reward: current_reward = current_reward * turn_discount + interaction.reward. areal/experimental/openai/cache.py103-104

Sources: areal/experimental/openai/cache.py55-105

Proxy Server Session Tracking

In the experimental proxy architecture, sessions are managed by SessionData, which wraps an InteractionCache. areal/experimental/openai/proxy/server.py66-73

Session Lifecycle:

Creation: A SessionData object is initialized with a unique session_id. areal/experimental/openai/proxy/server.py69-70
Activity Tracking: Every request updates _last_access_time. areal/experimental/openai/proxy/server.py80-83
Timeout: Sessions are considered stale after SESSION_TIMEOUT_SECONDS (default 3600s). areal/experimental/openai/proxy/server.py18 areal/experimental/openai/proxy/server.py85-88
Completion: The finish() method triggers a threading.Event to signal completion to waiting workers. areal/experimental/openai/proxy/server.py90-93

Sources: areal/experimental/openai/proxy/server.py18-113

Export Modes

The export_interactions() method filters the cache based on the desired training style. areal/experimental/openai/cache.py186-250

Comparison Table

Feature	`style="individual"`	`style="concat"`
Returns	All interactions in the cache.	Only leaf nodes (no children).
Parent links	Preserved for metadata.	Used to identify leaves.
Use case	Standard turn-by-turn training.	Trajectory-based training (e.g., GRPO).
Constraint	None.	All interactions must use `"concat"` template.

Sources: areal/experimental/openai/cache.py186-250

Incomplete Interaction Handling

The cache handles concurrent or failed requests by validating that an interaction is complete before processing it as a parent or exporting it.

If export_interactions() is called while some requests are still in flight, those requests are logged as warnings and excluded from the exported dictionary to prevent training on partial data. areal/experimental/openai/cache.py221-240

Sources: areal/experimental/openai/cache.py157-158 areal/experimental/openai/cache.py221-240

Thread Safety

The InteractionCache uses a threading.Lock to ensure that reward updates are atomic, which is essential when multiple reward models or human annotators are submitting scores concurrently. areal/experimental/openai/cache.py18 areal/experimental/openai/cache.py46-49

Sources: areal/experimental/openai/cache.py18 areal/experimental/openai/cache.py46-49

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/6.3-interactioncache-and-session-tracking