VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/13.1-performance-tracing

⇱ Performance Tracing | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Performance Tracing

Purpose and Overview

AReaL provides two complementary performance tracing systems implemented in areal/utils/perf_tracer.py1-1477 These systems enable method-level performance profiling and session lifecycle tracking for async RL workflows:

  1. PerfTracer: Method-level instrumentation using decorators and context managers to capture operation durations, categorized by type (compute, communication, I/O, etc.) areal/utils/perf_tracer.py62-93
  2. SessionTracer: High-level tracking of rollout session lifecycles, including phase breakdowns (generate, reward, toolcall) and derived metrics areal/utils/perf_tracer.py425-672

Both systems output Chrome Trace-compatible JSONL files that can be visualized using Chrome's chrome://tracing or converted to interactive HTML plots using provided tools areal/tools/perf_trace_converter.py1-10 areal/tools/plot_session_trace.py1-16

Key Features:

Sources: areal/utils/perf_tracer.py1-1477 areal/tools/perf_trace_converter.py1-10 areal/tools/plot_session_trace.py1-16

Architecture Overview

The tracing system uses ContextVar to maintain tracing state across asynchronous boundaries and provides both synchronous and asynchronous APIs.

System Architecture: Tracing Context and Flow


Sources: areal/utils/perf_tracer.py30-40 areal/utils/perf_tracer.py117-118 areal/utils/perf_tracer.py425-450

Configuration

Performance tracing is configured via PerfTracerConfig and SessionTracerConfig dataclasses areal/api/cli_args.py25

Config ClassPurposeKey Fields
PerfTracerConfigMethod-level tracingfileroot, experiment_name, trial_name, save_interval
SessionTracerConfigSession lifecycle trackingflush_threshold (sessions buffered before write)

Both tracers write rank-qualified JSONL files to standardized paths using _default_trace_path areal/utils/perf_tracer.py199-230

Sources: areal/utils/perf_tracer.py199-230 areal/utils/perf_tracer.py117-118 areal/api/cli_args.py25

PerfTracer: Method-Level Performance Tracking

PerfTracer instruments individual methods and code blocks to capture operation durations and emit Chrome Trace events.

Event Categories

Operations are classified using the PerfTraceCategory enum areal/utils/perf_tracer.py62-93:

CategoryDescriptionExample Use Cases
COMPUTECPU/GPU computationForward/backward passes, loss calculation
COMMDistributed communicationAll-reduce, broadcast, P2P transfers
IODisk I/O operationsCheckpoint save/load, data loading
SYNCSynchronization primitivesBarriers, locks, waits
SCHEDULERTask schedulingQueue operations, worker dispatch
INSTRInstrumentation overheadProfiling/tracing bookkeeping
MISCUncategorized eventsGeneral utility operations

Sources: areal/utils/perf_tracer.py62-93

Instrumentation Methods

Decorator Usage (@trace_perf)


Context Manager (Sync and Async)


Sources: areal/utils/perf_tracer.py1161-1250 areal/utils/perf_tracer.py1255-1340

SessionTracer: Rollout Session Lifecycle Tracking

SessionTracer tracks the complete lifecycle of individual rollout sessions from submission through finalization, capturing phase-level breakdowns and derived metrics areal/utils/perf_tracer.py425-672

SessionRecord Structure

Each session is represented by a SessionRecord areal/utils/perf_tracer.py425-450

Session Lifecycle State Machine


Key Fields in SessionRecord areal/utils/perf_tracer.py425-450:

  • task_id: Identifier for the dataset-level task.
  • session_id: Unique session identifier (auto-incremented).
  • submit_ts: Submission timestamp (wall-clock time).
  • finalized_ts: Finalization timestamp.
  • status: Current state ("pending", "accepted", "rejected", "failed", "dropped").
  • phases: Dict mapping phase name to list of PhaseSpan executions.

Sources: areal/utils/perf_tracer.py425-450

Instrumentation in Workflows

Workflow implementations use decorators and context managers to automatically track sessions.

Implementation in RLVRWorkflow areal/workflow/rlvr.py82-136:

  • @session_context(): Registers a new session and sets the _current_session_id context variable areal/workflow/rlvr.py110
  • @trace_session(phase): Wraps a method to mark the start and end of a specific phase (e.g., "reward") areal/workflow/rlvr.py82
  • atrace_session_phase(phase): An async context manager for wrapping specific blocks like generation areal/workflow/rlvr.py129

Implementation in VisionRLVRWorkflow areal/workflow/vision_rlvr.py45-101:

Sources: areal/workflow/rlvr.py82-136 areal/workflow/vision_rlvr.py45-101 areal/utils/perf_tracer.py816-1050

Derived Metrics

SessionRecord automatically computes durations for phases by summing up spans in the phases dictionary areal/utils/perf_tracer.py593-650:

MetricComputationDescription
total_sfinalized_ts - submit_tsTotal session duration
generate_sSum of all generate phase spansTime spent in generation
reward_sSum of all reward phase spansTime spent computing rewards
toolcall_sSum of all toolcall phase spansTime spent in tool calls

Sources: areal/utils/perf_tracer.py593-650

Visualization Tools

AReaL provides dedicated tools for converting and plotting trace data.

perf_trace_converter

This tool converts raw JSONL traces into a single JSON file compatible with Chrome's chrome://tracing areal/tools/perf_trace_converter.py1-10

Key Functions:

Sources: areal/tools/perf_trace_converter.py1-215

plot_session_trace

Generates interactive HTML visualizations of session trace data using Plotly areal/tools/plot_session_trace.py1-16

Plot Types:

  1. Lifecycle Timeline: Gantt-style timeline showing session phases (Generate, Reward, Tool Call) with color-coded segments areal/tools/plot_session_trace.py30-42
  2. Duration Histograms: Distribution of time spent in different phases, using automated binning logic areal/tools/plot_session_trace.py55-60 areal/tools/plot_session_trace.py65-129
  3. Status Colors: Visual differentiation of accepted, rejected, failed, or dropped sessions areal/tools/plot_session_trace.py31-36

Sources: areal/tools/plot_session_trace.py1-129

Trace Data Flow

The following diagram illustrates the flow from code entities to the final trace files.

Data Flow: Code to Trace Output


Sources: areal/workflow/rlvr.py82-136 areal/workflow/vision_rlvr.py45-101 areal/utils/perf_tracer.py117-118 areal/utils/perf_tracer.py652-667

Output Format

Both tracers write newline-delimited JSON (JSONL) files.

Performance Trace Format



Session Trace Format



Sources: areal/utils/perf_tracer.py117-118 areal/utils/perf_tracer.py652-667