Last indexed: 7 May 2026 (2e12c1)

MicroBatch System

The MicroBatch System provides utilities for splitting training batches into smaller micro-batches to enable gradient accumulation and efficient memory usage during distributed training. This system handles variable-length sequence batching, memory-efficient padding, and distributed synchronization of micro-batch allocations across training ranks.

For information about how micro-batches are configured, see 2.7. MicroBatchSpec and Data Configurations For information about how micro-batches flow through training engines during forward/backward passes, see 3.5. Microbatching Pipeline

Core Data Structures

MicroBatchSpec

MicroBatchSpec is the configuration dataclass that controls how batches are split into micro-batches. It supports both fixed-count splitting and dynamic token-balanced splitting, as well as configurable sequence packing algorithms.

MicroBatchSpec Structure

Field	Type	Description
`n_mbs`	`int	None`
`granularity`	`int`	Adjacent sequences are grouped by this size when dividing micro-batches.
`max_tokens_per_mb`	`int	None`
`n_mbs_divisor`	`int`	Final number of micro-batches will be adjusted to be divisible by this value.
`packing_algorithm`	`str`	Algorithm for sequence allocation. Supports `"ffd"` (default) or `"kk"` (Karmarkar-Karp).

Sources: areal/api/cli_args.py99-139

MicroBatchItem

MicroBatchItem is a NamedTuple representing a single micro-batch during iteration. It contains both the original micro-batch data (used for loss computation context) and the padded micro-batch data (used for model forward pass).

MicroBatchItem Structure

Field	Description
`orig_mb`	Original micro-batch dict before padding (for loss weight and context).
`padded_mb`	Padded micro-batch dict ready for model forward pass.
`padding_length`	Batch-level padding tokens added to this micro-batch.
`old_cu_seqlens`	Original cumulative sequence lengths before sequence alignment.
`padded_to_length`	The final padded sequence length for this micro-batch.

Sources: areal/utils/data.py367-383

MicroBatchList

MicroBatchList is the primary container returned by splitting functions. It holds the split micro-batches along with metadata needed for reordering outputs and managing padding.

MicroBatchList Structure

Sources: areal/utils/data.py385-472

Batch Splitting Algorithm

The micro-batch splitting process transforms a padded batch into multiple micro-batches with balanced token counts to maximize throughput and minimize memory spikes.

High-Level Flow

Batch Splitting Data Flow

Sources: areal/utils/data.py477-593

Token-Balanced Allocation

The allocate_balanced_mbs() function utilizes a registry-based allocation strategy. It retrieves the appropriate function via get_allocate_fn() based on the packing_algorithm field in MicroBatchSpec.

Allocation Algorithm Dispatch

FFD (First Fit Decreasing): A greedy heuristic that sorts sequences by length and assigns them to the first available micro-batch bin. areal/utils/seqpack.py196-203
KK (Karmarkar-Karp): The Largest Differencing Method. It produces near-optimal balance by iteratively merging the two most imbalanced partial partitions. It is recommended for large-scale RL training with high variance in sequence lengths. areal/utils/seqpack.py162-164

Sources: areal/utils/data.py244-253 areal/utils/seqpack.py167-188

Distributed Synchronization

allocate_balanced_mbs_synced() ensures all ranks in a data-parallel group use the same number of micro-batches by performing an all_gather_object on the local micro-batch counts and taking the maximum.

Micro-batch Sync Sequence

Sources: areal/utils/data.py256-270

Padding Strategies

Sequence Packing and Padding

AReaL uses packed sequence representation with cumulative sequence lengths (cu_seqlens) to minimize padding. Padding is applied at two levels:

Sequence-level alignment: Aligns individual sequences to multiples of seq_align_to (required for Ulysses sequence parallelism). areal/utils/data.py599-686
Batch-level padding: Adds dummy sequences to reach a specific target length (often for memory alignment or Ulysses constraints). areal/utils/data.py688-793

Padding Function: `pad_mb_list()`

The pad_mb_list() function applies padding to each micro-batch in a MicroBatchList. It can pad to the maximum sequence length found in the list or to individual page-aligned lengths.

Padding Logic Flow

Sources: areal/utils/data.py688-793

Integration with Training Engines

All training engines follow a common pattern for micro-batch processing.

Common Training Flow

Engine Micro-batch Processing

Engine-Specific Implementation

FSDP Engine: The FSDPEngine utilizes _prepare_mb_list to split the batch and then iterates through the list, preparing each micro-batch with _prepare_mb_inputs which handles Ulysses sequence parallel slicing and tree training logic. areal/engine/fsdp_engine.py1090-1197
Megatron Engine: The MegatronEngine uses _prepare_mb_list and then passes the list to its internal forward_backward_func. The forward_step callback extracts the padded_mb from the MicroBatchItem provided by the iterator. areal/engine/megatron_engine.py570-619 areal/engine/megatron_engine.py1053-1087
Archon Engine: The ArchonEngine similarly implements _prepare_mb_list to generate the MicroBatchList and uses it within its forward_backward_batch execution, supporting native PyTorch pipelining schedules. areal/experimental/engine/archon_engine.py91-102

Key Utility Functions

pack_tensor_dict(): Converts padded batch tensors into packed format with cu_seqlens. areal/utils/data.py273-322
split_padded_tensor_dict_into_mb_list(): The primary entry point for micro-batch splitting logic. areal/utils/data.py477-593
pad_mb_list(): Handles batch-level and sequence-level padding for a list of micro-batches. areal/utils/data.py688-793
unsqueeze_mb_list(): Converts packed micro-batches into 3D tensors [1, S, ...] for specific engine requirements. areal/utils/data.py795-850

Sources: areal/utils/data.py1-850

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/10.1-microbatch-system