Last indexed: 7 May 2026 (2e12c1)

allocation_mode Syntax

This page provides a detailed technical reference for the allocation_mode pattern syntax used to specify resource distribution across inference and training components in AReaL. For broader configuration system concepts, see Configuration Overview For training-specific parallelism settings, see Training Engine Configurations

Purpose and Scope

The allocation_mode field is a pattern-based string specification that controls:

GPU allocation between inference and training pools. areal/api/alloc_mode.py19-22
Parallelism strategy for each component (data, tensor, pipeline, context, and expert dimensions). areal/api/alloc_mode.py33-60
Backend selection (SGLang/vLLM for inference, FSDP/Megatron/Archon for training). areal/api/alloc_mode.py144-147
Total GPU count required for the experiment. areal/api/alloc_mode.py153-161

The system parses this string into structured configuration objects (ParallelStrategy, AllocationType) that orchestrate distributed execution across cluster resources. In modern AReaL versions (v0.3+), while per-engine backend fields are preferred, allocation_mode remains a powerful shorthand for SPMD launchers (local, Ray, or Slurm). docs/en/cli_reference.md102-104

Sources: areal/api/alloc_mode.py19-22 areal/api/alloc_mode.py33-60 areal/api/alloc_mode.py153-161 docs/en/cli_reference.md102-104

Syntax Components

Basic Format

<component_spec> ::= <backend> ":" <dimension_spec>
<allocation_mode> ::= <component_spec> [ "+" <component_spec> ]

The + operator separates components that execute on distinct GPU pools. A single component uses GPUs exclusively (e.g., for SFT or inference-only tasks), while two components split available GPUs between inference and training for asynchronous RL workflows. areal/api/alloc_mode.py19-22

Sources: areal/api/alloc_mode.py19-22

Dimension Specification Syntax

The following diagram bridges the high-level syntax string to the internal code entities used to represent parallelism.

Natural Language to Code Entity Space: Parallelism Dimensions

Dimension Specification Table:

Abbreviation	Dimension	Field in `ParallelStrategy`	Description
`d`	Data Parallel	`data_parallel_size`	Number of model replicas processing different data shards. areal/api/alloc_mode.py68-70
`t`	Tensor Parallel	`tensor_parallel_size`	Horizontal split of model operations across devices. areal/api/alloc_mode.py62-64
`p`	Pipeline Parallel	`pipeline_parallel_size`	Vertical split of model layers into stages. areal/api/alloc_mode.py65-67
`c`	Context Parallel	`context_parallel_size`	Split sequence length (attention-specific). areal/api/alloc_mode.py71-77
`e`	Expert Parallel	`expert_parallel_size`	MoE expert distribution across devices. areal/api/alloc_mode.py78-84

Sources: areal/api/alloc_mode.py62-84 areal/api/alloc_mode.py118-145

World Size Calculation

The total GPU count for a component (its world_size) is computed as the product of the mesh dimensions. areal/api/alloc_mode.py153-161

world_size = d × t × p × c

Important: Expert parallelism (e) does not contribute to world size calculation. It redistributes experts within the existing d × t × p × c mesh. The ParallelStrategy class explicitly calculates this in its world_size property. areal/api/alloc_mode.py153-161

Dimension Calculation Logic

Sources: areal/api/alloc_mode.py153-161 areal/api/alloc_mode.py95-107

Backend Identifiers

Inference Backend Syntax

<inference_spec> ::= ("sglang" | "vllm") ":" <inference_dims>
<inference_dims> ::= "d" <int> ["t" <int>] ["p" <int>]

For inference, data parallelism (d) creates separate server instances. Each instance internally uses tensor parallelism (t) and optionally pipeline parallelism (p). areal/api/alloc_mode.py33-60 For example, sglang:d16 specifies 16 SGLang server replicas.

Sources: areal/api/alloc_mode.py33-60

Training Backend Syntax

<training_spec> ::= [<backend_name> ":"] <training_dims>
<backend_name> ::= "fsdp" | "megatron" | "archon"

Backend	Supported Dims	Selection Logic
`fsdp`	`d`, `t`, `c`	Native PyTorch FSDP2 implementation. areal/engine/fsdp_engine.py218-219
`megatron`	`d`, `t`, `p`, `c`, `e`	Megatron-LM support for MoE and PP. areal/engine/megatron_engine.py168-170
`archon`	`d`, `t`, `p`, `c`, `e`	Custom torch-native parallelism engine. areal/experimental/engine/archon_engine.py147-150

Backend Selection Logic

Sources: areal/api/alloc_mode.py33-60 areal/engine/fsdp_engine.py218-219 areal/engine/megatron_engine.py168-170 areal/experimental/engine/archon_engine.py147-150

Component Integration and Code Flow

Parsing and Strategy Construction

The following diagram traces the flow from the configuration string to the distributed process groups created in the training engines.

Code Entity Space: Allocation Flow to Process Groups

Code Integration Points:

Parsing: The syntax is parsed using Lark and converted via a Transformer in areal/api/alloc_mode.py. areal/api/alloc_mode.py8-13
Validation: The ParallelStrategy.__post_init__ method validates MoE-specific constraints, ensuring world_size is divisible by the expert model parallel size. areal/api/alloc_mode.py93-108
Abbreviation Mapping: The class provides properties like tp_size, pp_size, etc., for engine consumption. areal/api/alloc_mode.py118-151

Sources: areal/api/alloc_mode.py8-13 areal/api/alloc_mode.py93-108 areal/api/alloc_mode.py118-151

MoE Hybrid Parallelism Syntax

For Mixture-of-Experts (MoE) models, the syntax supports complex folding strategies. In ParallelStrategy, expert_parallel_size and expert_tensor_parallel_size are used to calculate expert_model_parallel_size for validation. areal/api/alloc_mode.py95-102 The system ensures that the total world size is divisible by the expert model parallel size. areal/api/alloc_mode.py104-107

Sources: areal/api/alloc_mode.py95-107

Validation Rules and Constraints

The system validates the allocation_mode against cluster resources and model properties.

General Constraints:

World Size Divisibility: If expert_parallel_size > 1, the world size must be divisible by expert_model_parallel_size (calculated as pp * etp * ep). areal/api/alloc_mode.py95-107
Resource Allocation: The sum of world_size across all components (e.g., Inference + Training) must not exceed the available GPU count. areal/api/alloc_mode.py153-161
Backend Compatibility: Specific backends like fsdp may not support pipeline parallelism (p). areal/engine/fsdp_engine.py218-219

Sources: areal/api/alloc_mode.py95-107 areal/api/alloc_mode.py153-161 areal/engine/fsdp_engine.py218-219

Complete Allocation Mode Examples

Single Component (Training Only)

Allocation Mode	Backend	Training GPUs	Description
`d8`	FSDP (Auto)	8	Simple 8-way Data Parallelism
`d4t2`	FSDP (Auto)	8	Hybrid DP and TP
`megatron:d2p2t4`	Megatron	16	3D Parallelism (DP=2, PP=2, TP=4)
`archon:d4t2c2`	Archon	16	Context Parallelism for long sequences

Two Component (Inference + Training)

Allocation Mode	Inference	Training	Total GPUs
`sglang:d2t4 + fsdp:d4t2`	SGLang (8 GPUs)	FSDP (8 GPUs)	16
`vllm:d4t4 + megatron:d2p2t4`	vLLM (16 GPUs)	Megatron (16 GPUs)	32
`sglang:d6 + archon:d2`	SGLang (6 GPUs)	Archon (2 GPUs)	8

Sources: areal/api/alloc_mode.py19-22

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/2.3-allocation_mode-syntax