VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/2.3-allocation_mode-syntax

⇱ allocation_mode Syntax | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

allocation_mode Syntax

This page provides a detailed technical reference for the allocation_mode pattern syntax used to specify resource distribution across inference and training components in AReaL. For broader configuration system concepts, see Configuration Overview For training-specific parallelism settings, see Training Engine Configurations

Purpose and Scope

The allocation_mode field is a pattern-based string specification that controls:

  1. GPU allocation between inference and training pools. areal/api/alloc_mode.py19-22
  2. Parallelism strategy for each component (data, tensor, pipeline, context, and expert dimensions). areal/api/alloc_mode.py33-60
  3. Backend selection (SGLang/vLLM for inference, FSDP/Megatron/Archon for training). areal/api/alloc_mode.py144-147
  4. Total GPU count required for the experiment. areal/api/alloc_mode.py153-161

The system parses this string into structured configuration objects (ParallelStrategy, AllocationType) that orchestrate distributed execution across cluster resources. In modern AReaL versions (v0.3+), while per-engine backend fields are preferred, allocation_mode remains a powerful shorthand for SPMD launchers (local, Ray, or Slurm). docs/en/cli_reference.md102-104

Sources: areal/api/alloc_mode.py19-22 areal/api/alloc_mode.py33-60 areal/api/alloc_mode.py153-161 docs/en/cli_reference.md102-104

Syntax Components

Basic Format

<component_spec> ::= <backend> ":" <dimension_spec>
<allocation_mode> ::= <component_spec> [ "+" <component_spec> ]

The + operator separates components that execute on distinct GPU pools. A single component uses GPUs exclusively (e.g., for SFT or inference-only tasks), while two components split available GPUs between inference and training for asynchronous RL workflows. areal/api/alloc_mode.py19-22

Sources: areal/api/alloc_mode.py19-22

Dimension Specification Syntax

The following diagram bridges the high-level syntax string to the internal code entities used to represent parallelism.

Natural Language to Code Entity Space: Parallelism Dimensions


Dimension Specification Table:

AbbreviationDimensionField in ParallelStrategyDescription
dData Paralleldata_parallel_sizeNumber of model replicas processing different data shards. areal/api/alloc_mode.py68-70
tTensor Paralleltensor_parallel_sizeHorizontal split of model operations across devices. areal/api/alloc_mode.py62-64
pPipeline Parallelpipeline_parallel_sizeVertical split of model layers into stages. areal/api/alloc_mode.py65-67
cContext Parallelcontext_parallel_sizeSplit sequence length (attention-specific). areal/api/alloc_mode.py71-77
eExpert Parallelexpert_parallel_sizeMoE expert distribution across devices. areal/api/alloc_mode.py78-84

Sources: areal/api/alloc_mode.py62-84 areal/api/alloc_mode.py118-145

World Size Calculation

The total GPU count for a component (its world_size) is computed as the product of the mesh dimensions. areal/api/alloc_mode.py153-161

world_size = d × t × p × c

Important: Expert parallelism (e) does not contribute to world size calculation. It redistributes experts within the existing d × t × p × c mesh. The ParallelStrategy class explicitly calculates this in its world_size property. areal/api/alloc_mode.py153-161

Dimension Calculation Logic


Sources: areal/api/alloc_mode.py153-161 areal/api/alloc_mode.py95-107

Backend Identifiers

Inference Backend Syntax

<inference_spec> ::= ("sglang" | "vllm") ":" <inference_dims>
<inference_dims> ::= "d" <int> ["t" <int>] ["p" <int>]

For inference, data parallelism (d) creates separate server instances. Each instance internally uses tensor parallelism (t) and optionally pipeline parallelism (p). areal/api/alloc_mode.py33-60 For example, sglang:d16 specifies 16 SGLang server replicas.

Sources: areal/api/alloc_mode.py33-60

Training Backend Syntax

<training_spec> ::= [<backend_name> ":"] <training_dims>
<backend_name> ::= "fsdp" | "megatron" | "archon"
BackendSupported DimsSelection Logic
fsdpd, t, cNative PyTorch FSDP2 implementation. areal/engine/fsdp_engine.py218-219
megatrond, t, p, c, eMegatron-LM support for MoE and PP. areal/engine/megatron_engine.py168-170
archond, t, p, c, eCustom torch-native parallelism engine. areal/experimental/engine/archon_engine.py147-150

Backend Selection Logic


Sources: areal/api/alloc_mode.py33-60 areal/engine/fsdp_engine.py218-219 areal/engine/megatron_engine.py168-170 areal/experimental/engine/archon_engine.py147-150

Component Integration and Code Flow

Parsing and Strategy Construction

The following diagram traces the flow from the configuration string to the distributed process groups created in the training engines.

Code Entity Space: Allocation Flow to Process Groups


Code Integration Points:

  1. Parsing: The syntax is parsed using Lark and converted via a Transformer in areal/api/alloc_mode.py. areal/api/alloc_mode.py8-13
  2. Validation: The ParallelStrategy.__post_init__ method validates MoE-specific constraints, ensuring world_size is divisible by the expert model parallel size. areal/api/alloc_mode.py93-108
  3. Abbreviation Mapping: The class provides properties like tp_size, pp_size, etc., for engine consumption. areal/api/alloc_mode.py118-151

Sources: areal/api/alloc_mode.py8-13 areal/api/alloc_mode.py93-108 areal/api/alloc_mode.py118-151

MoE Hybrid Parallelism Syntax

For Mixture-of-Experts (MoE) models, the syntax supports complex folding strategies. In ParallelStrategy, expert_parallel_size and expert_tensor_parallel_size are used to calculate expert_model_parallel_size for validation. areal/api/alloc_mode.py95-102 The system ensures that the total world size is divisible by the expert model parallel size. areal/api/alloc_mode.py104-107

Sources: areal/api/alloc_mode.py95-107

Validation Rules and Constraints

The system validates the allocation_mode against cluster resources and model properties.

General Constraints:

  • World Size Divisibility: If expert_parallel_size > 1, the world size must be divisible by expert_model_parallel_size (calculated as pp * etp * ep). areal/api/alloc_mode.py95-107
  • Resource Allocation: The sum of world_size across all components (e.g., Inference + Training) must not exceed the available GPU count. areal/api/alloc_mode.py153-161
  • Backend Compatibility: Specific backends like fsdp may not support pipeline parallelism (p). areal/engine/fsdp_engine.py218-219

Sources: areal/api/alloc_mode.py95-107 areal/api/alloc_mode.py153-161 areal/engine/fsdp_engine.py218-219

Complete Allocation Mode Examples

Single Component (Training Only)

Allocation ModeBackendTraining GPUsDescription
d8FSDP (Auto)8Simple 8-way Data Parallelism
d4t2FSDP (Auto)8Hybrid DP and TP
megatron:d2p2t4Megatron163D Parallelism (DP=2, PP=2, TP=4)
archon:d4t2c2Archon16Context Parallelism for long sequences

Two Component (Inference + Training)

Allocation ModeInferenceTrainingTotal GPUs
sglang:d2t4 + fsdp:d4t2SGLang (8 GPUs)FSDP (8 GPUs)16
vllm:d4t4 + megatron:d2p2t4vLLM (16 GPUs)Megatron (16 GPUs)32
sglang:d6 + archon:d2SGLang (6 GPUs)Archon (2 GPUs)8

Sources: areal/api/alloc_mode.py19-22