VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/10.1-microbatch-system

⇱ MicroBatch System | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

MicroBatch System

The MicroBatch System provides utilities for splitting training batches into smaller micro-batches to enable gradient accumulation and efficient memory usage during distributed training. This system handles variable-length sequence batching, memory-efficient padding, and distributed synchronization of micro-batch allocations across training ranks.

For information about how micro-batches are configured, see 2.7. MicroBatchSpec and Data Configurations For information about how micro-batches flow through training engines during forward/backward passes, see 3.5. Microbatching Pipeline

Core Data Structures

MicroBatchSpec

MicroBatchSpec is the configuration dataclass that controls how batches are split into micro-batches. It supports both fixed-count splitting and dynamic token-balanced splitting, as well as configurable sequence packing algorithms.

MicroBatchSpec Structure





































FieldTypeDescription
n_mbs`intNone`
granularityintAdjacent sequences are grouped by this size when dividing micro-batches.
max_tokens_per_mb`intNone`
n_mbs_divisorintFinal number of micro-batches will be adjusted to be divisible by this value.
packing_algorithmstrAlgorithm for sequence allocation. Supports "ffd" (default) or "kk" (Karmarkar-Karp).

Sources: areal/api/cli_args.py99-139

MicroBatchItem

MicroBatchItem is a NamedTuple representing a single micro-batch during iteration. It contains both the original micro-batch data (used for loss computation context) and the padded micro-batch data (used for model forward pass).

MicroBatchItem Structure































FieldDescription
orig_mbOriginal micro-batch dict before padding (for loss weight and context).
padded_mbPadded micro-batch dict ready for model forward pass.
padding_lengthBatch-level padding tokens added to this micro-batch.
old_cu_seqlensOriginal cumulative sequence lengths before sequence alignment.
padded_to_lengthThe final padded sequence length for this micro-batch.

Sources: areal/utils/data.py367-383

MicroBatchList

MicroBatchList is the primary container returned by splitting functions. It holds the split micro-batches along with metadata needed for reordering outputs and managing padding.

MicroBatchList Structure


Sources: areal/utils/data.py385-472

Batch Splitting Algorithm

The micro-batch splitting process transforms a padded batch into multiple micro-batches with balanced token counts to maximize throughput and minimize memory spikes.

High-Level Flow

Batch Splitting Data Flow


Sources: areal/utils/data.py477-593

Token-Balanced Allocation

The allocate_balanced_mbs() function utilizes a registry-based allocation strategy. It retrieves the appropriate function via get_allocate_fn() based on the packing_algorithm field in MicroBatchSpec.

Allocation Algorithm Dispatch


  • FFD (First Fit Decreasing): A greedy heuristic that sorts sequences by length and assigns them to the first available micro-batch bin. areal/utils/seqpack.py196-203
  • KK (Karmarkar-Karp): The Largest Differencing Method. It produces near-optimal balance by iteratively merging the two most imbalanced partial partitions. It is recommended for large-scale RL training with high variance in sequence lengths. areal/utils/seqpack.py162-164

Sources: areal/utils/data.py244-253 areal/utils/seqpack.py167-188

Distributed Synchronization

allocate_balanced_mbs_synced() ensures all ranks in a data-parallel group use the same number of micro-batches by performing an all_gather_object on the local micro-batch counts and taking the maximum.

Micro-batch Sync Sequence


Sources: areal/utils/data.py256-270

Padding Strategies

Sequence Packing and Padding

AReaL uses packed sequence representation with cumulative sequence lengths (cu_seqlens) to minimize padding. Padding is applied at two levels:

  1. Sequence-level alignment: Aligns individual sequences to multiples of seq_align_to (required for Ulysses sequence parallelism). areal/utils/data.py599-686
  2. Batch-level padding: Adds dummy sequences to reach a specific target length (often for memory alignment or Ulysses constraints). areal/utils/data.py688-793

Padding Function: pad_mb_list()

The pad_mb_list() function applies padding to each micro-batch in a MicroBatchList. It can pad to the maximum sequence length found in the list or to individual page-aligned lengths.

Padding Logic Flow


Sources: areal/utils/data.py688-793

Integration with Training Engines

All training engines follow a common pattern for micro-batch processing.

Common Training Flow

Engine Micro-batch Processing


Engine-Specific Implementation

  • FSDP Engine: The FSDPEngine utilizes _prepare_mb_list to split the batch and then iterates through the list, preparing each micro-batch with _prepare_mb_inputs which handles Ulysses sequence parallel slicing and tree training logic. areal/engine/fsdp_engine.py1090-1197
  • Megatron Engine: The MegatronEngine uses _prepare_mb_list and then passes the list to its internal forward_backward_func. The forward_step callback extracts the padded_mb from the MicroBatchItem provided by the iterator. areal/engine/megatron_engine.py570-619 areal/engine/megatron_engine.py1053-1087
  • Archon Engine: The ArchonEngine similarly implements _prepare_mb_list to generate the MicroBatchList and uses it within its forward_backward_batch execution, supporting native PyTorch pipelining schedules. areal/experimental/engine/archon_engine.py91-102

Key Utility Functions

Sources: areal/utils/data.py1-850