VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/13.5-metrics-tracking

⇱ Metrics Tracking | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Metrics Tracking

AReaL provides a unified metrics tracking system that handles statistics collection across distributed training and rollout workers. The system supports two distinct paradigms optimized for their respective use cases: streaming metrics for asynchronous rollout workflows and batch metrics for synchronous training updates.

System Architecture

The metrics system is built around a hierarchical aggregation pattern. Raw metrics are collected at the worker level (e.g., within a specific training step or rollout), aggregated across the distributed process group, and finally committed to a persistent logger.

Metrics Data Flow

The following diagram illustrates the flow from raw data generation in the engines to the final visualization backends.

Figure 1: Metrics Aggregation and Logging Pipeline


Sources: areal/utils/stats_tracker.py31-41 areal/utils/stats_logger.py135-160 areal/utils/stats_logger.py152-158

DistributedStatsTracker

The DistributedStatsTracker is the core utility for in-memory aggregation. It allows different components to isolate their metrics using hierarchical scoping and provides thread-safe recording via a threading.Lock areal/utils/stats_tracker.py31-41

Key Functions

Reduction Types

The tracker supports several reduction modes via the ReduceType enum areal/utils/stats_tracker.py19-26:

TypeDescriptionOutput Keys
AVG_MIN_MAXDefault for tensors. Computes mean, min, and max.key/avg, key/min, key/max
AVGWeighted average based on denominator.key
SUMSimple summation across all elements/ranks.key
MINMinimum value across elements/ranks.key
MAXMaximum value across elements/ranks.key
SCALARTracks a float and its occurrence count.key, key__count

Sources: areal/utils/stats_tracker.py19-26 areal/utils/stats_tracker.py116-145 areal/utils/stats_tracker.py190-196

Two Logging Paradigms

AReaL distinguishes between two logging paradigms based on the synchronization requirements of the workload.

1. Streaming Metrics (Rollout Workers)

Rollout workers execute workflows asynchronously. Each workflow logs scalars individually as they complete.

  • Characteristics: Metrics accumulate in a list within the worker process; no synchronization occurs during logging areal/utils/stats_tracker.py40
  • Aggregation: Components (like RolloutController) collect these raw stats and compute weighted averages by filtering out keys with the __count suffix for visualization while using them for calculation areal/utils/stats_logger.py148-149

2. Batch Metrics (Training Engines)

Training engines process data in synchronized batches across data-parallel ranks.

Sources: areal/utils/stats_tracker.py152-184 areal/utils/stats_logger.py148-149 docs/en/reference/metrics_tracking.md29-87

StatsLogger and Backends

The StatsLogger manages the lifecycle of external tracking sessions. It initializes and commits only on Rank 0 to prevent redundant network calls areal/utils/stats_logger.py38-39

Configuration to Code Mapping

The system maps hierarchical dataclasses to specific library initializations.

Figure 2: Metric Configuration Mapping


Sources: areal/utils/stats_logger.py23-33 areal/utils/stats_logger.py63-78 areal/utils/stats_logger.py99-107 areal/utils/stats_logger.py88-95

Implementation Details

Performance and Evaluation Tracking

Timing

The record_timing(key) context manager automatically tracks execution duration using time.perf_counter(). These records are fixed under the timeperf/ scope and recorded as ReduceType.SCALAR areal/utils/stats_tracker.py85-94

Evaluation Frequency

The Evaluator class uses EpochStepTimeFreqCtl to determine when to trigger evaluation functions based on epoch, step, or wall-clock time intervals areal/utils/evaluator.py10-20

Commit and State

The StatsLogger maintains a _last_commit_step to ensure sequential logging across global steps areal/utils/stats_logger.py145-159 It supports saving and loading this state via state_dict() and load_state_dict() to maintain continuity across experiment resumes areal/utils/stats_logger.py114-120

Sources: areal/utils/stats_tracker.py85-94 areal/utils/stats_logger.py114-120 areal/utils/stats_logger.py145-159 areal/utils/evaluator.py10-20