VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/2.7-microbatchspec-and-data-configurations

⇱ MicroBatchSpec and Data Configurations | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

MicroBatchSpec and Data Configurations

This page documents the micro-batching configuration system and data processing pipeline in AReaL. Micro-batching is a critical component for efficient distributed training, allowing large batches to be processed in smaller chunks to manage GPU memory and enable gradient accumulation.

Scope: This page covers MicroBatchSpec configuration for controlling micro-batch splitting, the data structures used during training (MicroBatchList, MicroBatchItem), and the data processing pipeline from raw sequences to packed tensors. For dataset loading configurations, see 10.5. Datasets and Reward Functions


MicroBatchSpec Overview

The MicroBatchSpec dataclass controls how batches are divided into micro-batches during training. This configuration is specified in TrainEngineConfig.mb_spec and applies to both forward and backward passes.

MicroBatch Configuration Flow

The following diagram illustrates how configuration parameters from MicroBatchSpec drive the splitting logic in areal/utils/data.py.


Sources: areal/api/cli_args.py99-139 areal/utils/data.py477-593 areal/utils/data.py244-270


MicroBatchSpec Parameters

ParameterTypeDefaultDescription
n_mbsint | None1Number of micro-batches (or minimum number if max_tokens_per_mb is set). areal/api/cli_args.py102-108
granularityint1Adjacent sequences are grouped by this size when dividing micro-batches. Useful for group-based algorithms like GRPO. areal/api/cli_args.py109-114
max_tokens_per_mbint | NoneNoneMaximum tokens per micro-batch for each forward pass. When set, n_mbs becomes the minimum count. areal/api/cli_args.py115-120
n_mbs_divisorint1Divisor for the number of micro-batches. Final count will be adjusted to be divisible by this. areal/api/cli_args.py121-126
packing_algorithmstr"ffd"Sequence packing algorithm for allocation. Supported: "ffd" (First Fit Decreasing), "kk" (Karmarkar-Karp). areal/api/cli_args.py127-140

Parameter Interactions

The micro-batch count and assignment are determined by:

  1. If max_tokens_per_mb is None: Uses exactly n_mbs micro-batches areal/api/cli_args.py102-108
  2. If max_tokens_per_mb is set: Uses the selected packing_algorithm to allocate sequences, with n_mbs as the minimum count areal/api/cli_args.py115-120
  3. Granularity constraint: Batch size must be divisible by granularity. This is validated in split_padded_tensor_dict_into_mb_list areal/utils/data.py500-502
  4. Divisor constraint: Final micro-batch count is adjusted to be divisible by n_mbs_divisor areal/api/cli_args.py121-126

Distributed synchronization: When running distributed training, allocate_balanced_mbs_synced() ensures all data parallel ranks agree on the number of micro-batches by taking the maximum across ranks areal/utils/data.py256-270

Sources: areal/api/cli_args.py99-139 areal/utils/data.py244-270


Sequence Packing Algorithms

AReaL supports configurable algorithms for micro-batch allocation. The selection is handled via get_allocate_fn(algorithm) in areal/utils/seqpack.py areal/utils/seqpack.py167-188

AlgorithmKeyDescriptionBalance Quality
First Fit DecreasingffdGreedy bin-packing. Sorts sequences by length and assigns to the first bin with capacity. areal/utils/seqpack.py196-203Good
Karmarkar-KarpkkLargest Differencing Method. Iteratively merges imbalanced partial partitions using a max-heap. areal/utils/seqpack.py214-221Excellent

When to use KK

The Karmarkar-Karp (kk) algorithm is recommended for large-scale RL training with highly variable sequence lengths (e.g., PPO with open-ended generation) or high DP parallelism (≥4 ranks), where even small imbalances cause significant idle time at synchronization barriers areal/api/cli_args.py132-135

Sources: areal/utils/seqpack.py161-188 areal/api/cli_args.py127-140


Data Processing Pipeline

The data processing pipeline transforms raw sequences into micro-batched tensors ready for model consumption.


Sources: areal/utils/data.py105-145 areal/utils/data.py273-322 areal/utils/data.py477-593 areal/utils/data.py693-847

Sequence Packing

Purpose: Packing converts padded 2D tensors [B, S] into 1D packed tensors [total_len] by removing padding, improving memory efficiency for variable-length sequences areal/utils/data.py273-280

Key function: pack_tensor_dict(data) at areal/utils/data.py273-322

Output: Dictionary with packed tensors and:

Sequence Unpacking

Key function: unpack_sequence(x, cu_seqlens) at areal/utils/data.py228-241 Splits a packed tensor back into variable-length sequences using cu_seqlens. Used during loss computation and output processing.


MicroBatchList Structure

MicroBatchList is the central data structure that flows through the training pipeline. It encapsulates micro-batches and their metadata.


Sources: areal/utils/data.py385-471

Core Attributes

AttributeTypeDescription
datadict[str, Any]Original input data (before splitting) areal/utils/data.py387-388
mb_specMicroBatchSpecConfiguration used to create this list areal/utils/data.py389-390
mbslist[dict[str, Any]]List of original (unpadded) micro-batch dictionaries areal/utils/data.py391-392
forward_indiceslist[int]Sequence reordering from original to micro-batch order areal/utils/data.py395-396
backward_indiceslist[int]Reverse mapping (micro-batch order to original order) areal/utils/data.py397-398
padded_mbslist[dict] | NonePadded micro-batches ready for model forward (set by pad_mb_list()) areal/utils/data.py400-401

MicroBatchItem

When iterating over MicroBatchList, each iteration yields a MicroBatchItem named tuple:

FieldTypePurpose
orig_mbdict[str, Any]Original micro-batch (for loss weight computation) areal/utils/data.py369-370
padded_mbdict[str, Any]Padded micro-batch (for model forward pass) areal/utils/data.py371-372
padding_lengthintBatch-level padding added (for output unpadding) areal/utils/data.py373-374
old_cu_seqlensTensor | NoneOriginal cu_seqlens before sequence alignment areal/utils/data.py375-378

Sources: areal/utils/data.py367-383 areal/utils/data.py417-443


Micro-Batch Splitting Algorithm

The core splitting logic is implemented in split_padded_tensor_dict_into_mb_list().

Algorithm Steps

  1. Extract sequence lengths: Sum attention_mask along sequence dimension areal/utils/data.py495-498
  2. Group by granularity: If granularity > 1, group adjacent sequences and sum their lengths areal/utils/data.py503-510
  3. Bin packing: Use allocate_balanced_mbs_synced() with the selected packing_algorithm to assign groups to micro-batches areal/utils/data.py512-520
  4. Flatten to sequence indices: Convert group assignments to per-sequence indices areal/utils/data.py521-527
  5. Reorder tensors: Reorganize all tensors according to forward_indices areal/utils/data.py532-536
  6. Split: Divide reordered tensors into separate micro-batch dictionaries areal/utils/data.py558-581

Distribution synchronization: allocate_balanced_mbs_synced() ensures all data parallel ranks agree on the number of micro-batches by all-gathering counts and taking the maximum areal/utils/data.py256-270 This prevents deadlocks in pipeline parallel training.

Sources: areal/utils/data.py256-270 areal/utils/data.py477-593


Padding and Alignment

After splitting, micro-batches must be padded to uniform lengths for efficient batch processing via pad_mb_list().

Padding Strategies

StrategyWhen UsedFunction
Dynamic paddingDefault (pad_to_maximum=False)Each micro-batch padded to its own max sequence length areal/utils/data.py703-706
Maximum paddingpad_to_maximum=TrueAll micro-batches padded to global maximum length areal/utils/data.py707-709
Sequence alignmentContext parallel enabledSequences aligned to multiples of seq_align_to (typically CP size) areal/utils/data.py711-715
Page alignmentMemory optimizationOptionally align to page boundaries (e.g. 256 tokens) areal/utils/data.py716-720

Sources: areal/utils/data.py693-847


Configuration Examples

Example 1: GRPO with Group Granularity


Behavior: Groups every 4 consecutive sequences together (for GRPO group-based optimization) areal/api/cli_args.py109-114 Each group is treated as an indivisible unit during micro-batch assignment.

Example 2: Balanced Allocation with KK


Behavior: Uses the Karmarkar-Karp algorithm to distribute sequences into at least 4 micro-batches, ensuring near-optimal token balance across micro-batches to maximize hardware utilization areal/api/cli_args.py131-135

Sources: areal/api/cli_args.py99-139 areal/utils/data.py477-593


Advanced Features

Multi-Modal Data Support

The micro-batching system supports multi-modal inputs (e.g., vision-language models) through special handling of multi_modal_input keys in split_padded_tensor_dict_into_mb_list() areal/utils/data.py566-577 For vision models, _prepare_multimodal_forward_inputs is used to manage large tensors like pixel_values between original and padded micro-batches areal/engine/fsdp_engine.py192-215

Normalization Configurations

Reward and advantage normalization can be configured via NormConfig, supporting batch-level or group-level statistics.

ParameterDefaultDescription
mean_level"batch"Level for mean normalization (batch, group, or None). areal/api/cli_args.py46-52
std_level"batch"Level for std normalization (batch, group, or None). areal/api/cli_args.py57-63
group_size1Size of groups for group-level normalization. areal/api/cli_args.py76-78

Sources: areal/api/cli_args.py43-97 areal/utils/data.py88-91 areal/engine/fsdp_engine.py192-215