Last indexed: 7 May 2026 (2e12c1)

Metrics Tracking

AReaL provides a unified metrics tracking system that handles statistics collection across distributed training and rollout workers. The system supports two distinct paradigms optimized for their respective use cases: streaming metrics for asynchronous rollout workflows and batch metrics for synchronous training updates.

System Architecture

The metrics system is built around a hierarchical aggregation pattern. Raw metrics are collected at the worker level (e.g., within a specific training step or rollout), aggregated across the distributed process group, and finally committed to a persistent logger.

Metrics Data Flow

The following diagram illustrates the flow from raw data generation in the engines to the final visualization backends.

Figure 1: Metrics Aggregation and Logging Pipeline

Sources: areal/utils/stats_tracker.py31-41 areal/utils/stats_logger.py135-160 areal/utils/stats_logger.py152-158

DistributedStatsTracker

The DistributedStatsTracker is the core utility for in-memory aggregation. It allows different components to isolate their metrics using hierarchical scoping and provides thread-safe recording via a threading.Lock areal/utils/stats_tracker.py31-41

Key Functions

scope(name): A context manager for hierarchical scoping (e.g., ppo/actor/loss) areal/utils/stats_tracker.py42-46
denominator(**kwargs): Defines boolean masks used as denominators for tensor reductions. This is critical for handling padding in packed sequences areal/utils/stats_tracker.py96-108
stat(denominator, reduce_type, **kwargs): Records floating-point tensors associated with a denominator areal/utils/stats_tracker.py116-145
scalar(**kwargs): Records individual float values areal/utils/stats_tracker.py109-114
export(reduce_group): Performs distributed reduction. It synchronizes keys across all ranks using torch.distributed.all_gather_object and computes the final scalars areal/utils/stats_tracker.py152-184

Reduction Types

The tracker supports several reduction modes via the ReduceType enum areal/utils/stats_tracker.py19-26:

Type	Description	Output Keys
`AVG_MIN_MAX`	Default for tensors. Computes mean, min, and max.	`key/avg`, `key/min`, `key/max`
`AVG`	Weighted average based on denominator.	`key`
`SUM`	Simple summation across all elements/ranks.	`key`
`MIN`	Minimum value across elements/ranks.	`key`
`MAX`	Maximum value across elements/ranks.	`key`
`SCALAR`	Tracks a float and its occurrence count.	`key`, `key__count`

Sources: areal/utils/stats_tracker.py19-26 areal/utils/stats_tracker.py116-145 areal/utils/stats_tracker.py190-196

Two Logging Paradigms

AReaL distinguishes between two logging paradigms based on the synchronization requirements of the workload.

1. Streaming Metrics (Rollout Workers)

Rollout workers execute workflows asynchronously. Each workflow logs scalars individually as they complete.

Characteristics: Metrics accumulate in a list within the worker process; no synchronization occurs during logging areal/utils/stats_tracker.py40
Aggregation: Components (like RolloutController) collect these raw stats and compute weighted averages by filtering out keys with the __count suffix for visualization while using them for calculation areal/utils/stats_logger.py148-149

2. Batch Metrics (Training Engines)

Training engines process data in synchronized batches across data-parallel ranks.

Characteristics: Entire batch tensors are logged with masks (denominators) areal/utils/stats_tracker.py116-120
Aggregation: The engine calls export(reduce_group=dp_group), which triggers an all_gather_object across the specified group to ensure all ranks receive identical aggregated results areal/utils/stats_tracker.py168-173

Sources: areal/utils/stats_tracker.py152-184 areal/utils/stats_logger.py148-149 docs/en/reference/metrics_tracking.md29-87

StatsLogger and Backends

The StatsLogger manages the lifecycle of external tracking sessions. It initializes and commits only on Rank 0 to prevent redundant network calls areal/utils/stats_logger.py38-39

Configuration to Code Mapping

The system maps hierarchical dataclasses to specific library initializations.

Figure 2: Metric Configuration Mapping

Sources: areal/utils/stats_logger.py23-33 areal/utils/stats_logger.py63-78 areal/utils/stats_logger.py99-107 areal/utils/stats_logger.py88-95

Implementation Details

WandB: Captures full experiment metadata, including Git version_info (commit ID, branch, dirty status, and version string) areal/utils/stats_logger.py55-61
Trackio: Enabled via TrackioConfig. It is used for high-performance experiment tracking and requires a space_id areal/utils/stats_logger.py99-107
TensorBoard: Uses tensorboardX.SummaryWriter to log scalars to a local path specified in config.tensorboard.path areal/utils/stats_logger.py111-112
SwanLab: Supported as an alternative to WandB for experiment visualization, including support for custom API keys and project names areal/utils/stats_logger.py80-95

Performance and Evaluation Tracking

Timing

The record_timing(key) context manager automatically tracks execution duration using time.perf_counter(). These records are fixed under the timeperf/ scope and recorded as ReduceType.SCALAR areal/utils/stats_tracker.py85-94

Evaluation Frequency

The Evaluator class uses EpochStepTimeFreqCtl to determine when to trigger evaluation functions based on epoch, step, or wall-clock time intervals areal/utils/evaluator.py10-20

Commit and State

The StatsLogger maintains a _last_commit_step to ensure sequential logging across global steps areal/utils/stats_logger.py145-159 It supports saving and loading this state via state_dict() and load_state_dict() to maintain continuity across experiment resumes areal/utils/stats_logger.py114-120

Sources: areal/utils/stats_tracker.py85-94 areal/utils/stats_logger.py114-120 areal/utils/stats_logger.py145-159 areal/utils/evaluator.py10-20

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/13.5-metrics-tracking