VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/2.4-training-engine-configurations

⇱ Training Engine Configurations | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Training Engine Configurations

This page documents the configuration structures for AReaL's training engines, including TrainEngineConfig, OptimizerConfig, and engine-specific settings for FSDP, Megatron, and Archon backends. These configurations control model training behavior, optimization parameters, parallelism strategies, and backend-specific features.

For information about inference engine configurations, see Inference Engine Configurations For parallelism strategy configuration, see allocation_mode Syntax For algorithm-specific configurations like PPO parameters, see Algorithm-Specific Configurations


Configuration Architecture

Training engine configurations in AReaL follow a hierarchical structure where TrainEngineConfig serves as the core configuration containing engine-specific sub-configurations.

Configuration Hierarchy


Sources: areal/api/cli_args.py889-1005 areal/api/cli_args.py306-375


TrainEngineConfig

TrainEngineConfig is the core configuration class for training engines, containing common parameters that apply across all backends as well as engine-specific sub-configurations.

Core Training Parameters

ParameterTypeDefaultDescription
experiment_namestringRequiredName of the experiment areal/api/cli_args.py892
trial_namestringRequiredName of the trial within the experiment areal/api/cli_args.py895
pathstring""Path to HuggingFace checkpoint or model identifier areal/api/cli_args.py898
attn_implstring"flash_attention_2"Attention implementation. Choices: "flash_attention_2", "sdpa", "eager" areal/api/cli_args.py901
init_from_scratchbooleanFalseInitialize model weights randomly instead of loading pretrained weights areal/api/cli_args.py910
is_criticbooleanFalseWhether this engine is for a critic/reward model areal/api/cli_args.py913
temperaturefloat1.0Temperature for generation (if applicable) areal/api/cli_args.py916
mb_specMicroBatchSpecdefaultMicro-batch specification for memory management areal/api/cli_args.py919
pad_to_maximumbooleanFalsePad each micro-batch to maximum length (reduces fragmentation, slower) areal/api/cli_args.py922

Training Backend Settings

ParameterTypeDefaultDescription
disable_dropoutbooleanFalseDisable dropout layers during training areal/api/cli_args.py925
gradient_checkpointingbooleanFalseEnable gradient checkpointing to reduce memory usage areal/api/cli_args.py928
dtypestring"bfloat16"Parameter data type areal/api/cli_args.py931
grad_reduce_dtypestring"float32"Gradient reduction data type for distributed training areal/api/cli_args.py934
optimizerOptimizerConfig | NoneNoneOptimizer configuration. None means no training (inference only) areal/api/cli_args.py937
weight_update_modestring"xccl"Weight update backend. Choices: "disk", "xccl" areal/api/cli_args.py942

Engine-Specific Configurations

ParameterTypeDefaultDescription
fsdpFSDPEngineConfigdefaultFSDP engine-specific settings areal/api/cli_args.py947
archonArchonEngineConfigdefaultArchon engine-specific settings areal/api/cli_args.py950
megatronMegatronEngineConfigdefaultMegatron engine-specific settings areal/api/cli_args.py953

LoRA Configuration

ParameterTypeDefaultDescription
use_lorabooleanFalseEnable LoRA (Low-Rank Adaptation) training. Supported with FSDP and Megatron areal/api/cli_args.py956
lora_rankinteger32LoRA rank parameter areal/api/cli_args.py959
lora_alphainteger16LoRA alpha parameter areal/api/cli_args.py962
target_moduleslist of string[]Target modules for LoRA adaptation areal/api/cli_args.py965
peft_typestring"lora"PEFT method type. Only LoRA is currently supported areal/api/cli_args.py968

Tree Training Configuration

ParameterTypeDefaultDescription
enable_tree_trainingbooleanFalseEnable tree training for prefix-sharing efficiency areal/api/cli_args.py971

Sources: areal/api/cli_args.py889-1005

Configuration Flow Diagram


Sources: areal/api/cli_args.py889-1005 areal/engine/fsdp_engine.py218-222 areal/engine/megatron_engine.py168-173 areal/experimental/engine/archon_engine.py150-155


OptimizerConfig

OptimizerConfig specifies optimizer type, learning rate, scheduling, and related hyperparameters. It is referenced by TrainEngineConfig.optimizer.

Optimizer Type and Learning Rate

ParameterTypeDefaultDescription
typestring"adam"Optimizer type. Choices: "adam", "sgd", "adam_bf16". For FSDP, adam_bf16 enables memory-efficient BF16 optimizer states via AnyPrecisionAdamW areal/api/cli_args.py309-315 areal/engine/fsdp_utils/optimizer.py85
lrfloat0.001Learning rate areal/api/cli_args.py318
weight_decayfloat0.01Weight decay coefficient areal/api/cli_args.py321

Adam-Specific Parameters

ParameterTypeDefaultDescription
beta1float0.9Adam beta1 parameter. Only effective for adam/adam_bf16 areal/api/cli_args.py324
beta2float0.999Adam beta2 parameter. Only effective for adam/adam_bf16 areal/api/cli_args.py327
epsfloat1e-8Adam epsilon parameter. Only effective for adam/adam_bf16 areal/api/cli_args.py330

Learning Rate Scheduling

ParameterTypeDefaultDescription
lr_scheduler_typestring"constant"Learning rate scheduler type. Choices: "linear", "cosine", "constant" areal/api/cli_args.py333-337
warmup_steps_proportionfloat0.001Proportion of training steps for warmup areal/api/cli_args.py340
min_lr_ratiofloat0.0Minimum learning rate ratio after annealing areal/api/cli_args.py343

Optimizer State Management

ParameterTypeDefaultDescription
offloadbooleanFalseEnable optimizer state offloading to CPU areal/api/cli_args.py346

Mixed Precision Training (Loss Scaling)

ParameterTypeDefaultDescription
initial_loss_scalefloat4294967296 (2^32)Initial loss scaling factor areal/api/cli_args.py349
min_loss_scalefloat1.0Minimum loss scaling factor areal/api/cli_args.py352
loss_scale_windowfloat5Window size for loss scaling adjustment areal/api/cli_args.py355
hysteresisinteger2Hysteresis (scaling factor) for loss scaling areal/api/cli_args.py358

Gradient Clipping

ParameterTypeDefaultDescription
gradient_clippingfloat1.0Gradient clipping threshold areal/api/cli_args.py361

Sources: areal/api/cli_args.py306-375


FSDP Engine Configuration

FSDP (Fully Sharded Data Parallel) is PyTorch's native training backend supporting N-D parallelism. FSDPEngineConfig controls FSDP-specific behaviors.

FSDPEngineConfig

ParameterTypeDefaultDescription
wrap_policyFSDPWrapPolicy | NoneNoneFSDP wrap policy specifying model layers to wrap. None defaults to wrapping transformer decoder layers areal/api/cli_args.py391
offload_paramsbooleanFalseWhether to offload FSDP parameters to CPU areal/api/cli_args.py396
memory_efficient_loadbooleanFalseEnable memory-efficient model loading areal/api/cli_args.py399
shard_vision_across_spbooleanFalseShard vision encoder across SP ranks by image areal/api/cli_args.py408

FSDPWrapPolicy

ParameterTypeDefaultDescription
transformer_layer_cls_to_wraplist of string | NoneNoneList of transformer layer names for FSDP to wrap areal/api/cli_args.py381

Sources: areal/api/cli_args.py388-417 areal/api/cli_args.py378-385

FSDP Model Parallelization


Sources: areal/engine/fsdp_engine.py218-222 areal/engine/fsdp_utils/parallel.py86


Megatron Engine Configuration

Megatron-LM is NVIDIA's training framework supporting pipeline parallelism and expert parallelism. MegatronEngineConfig controls Megatron-Core specific features.

DistributedDataParallelConfig

Configuration for Megatron's DistributedDataParallel wrapper.

ParameterTypeDefaultDescription
grad_reduce_in_fp32booleanTrueReduce gradients in FP32 precision areal/api/cli_args.py573
overlap_grad_reducebooleanFalseOverlap gradient reduction with computation areal/api/cli_args.py574
overlap_param_gatherbooleanFalseOverlap parameter gather with computation areal/api/cli_args.py575
align_param_gatherbooleanFalseAlign parameter gather operations areal/api/cli_args.py576
use_distributed_optimizerbooleanTrueUse Megatron's distributed optimizer areal/api/cli_args.py577
bucket_sizeinteger | NoneNoneBucket size for gradient reduction areal/api/cli_args.py579

MegatronEngineConfig

ParameterTypeDefaultDescription
wrap_with_ddpbooleanTrueWrap model with DistributedDataParallel areal/api/cli_args.py695
ddpDistributedDataParallelConfigdefaultDDP configuration areal/api/cli_args.py704
virtual_pipeline_parallel_sizeinteger1Virtual pipeline parallel size for interleaved schedule areal/api/cli_args.py707
bridge_typestring"mbridge"Bridge type for weight loading. areal/api/cli_args.py692

Gradient Checkpointing Options

Only effective when TrainEngineConfig.gradient_checkpointing=True.

ParameterTypeDefaultDescription
recompute_granularitystring | None"full"Recomputation granularity areal/api/cli_args.py741
recompute_methodstring | None"uniform"Recomputation method areal/api/cli_args.py746
recompute_num_layersinteger | None1Number of layers to recompute areal/api/cli_args.py751

Sources: areal/api/cli_args.py692-772


Archon Engine Configuration

Archon is AReaL's experimental torch-native training backend. ArchonEngineConfig controls Archon-specific behaviors.

ArchonEngineConfig

ParameterTypeDefaultDescription
attn_typestring"varlen"Attention backend type. Choices: "varlen", "sdpa", "tree" areal/api/cli_args.py423-427
enable_compilebooleanTrueEnable torch.compile for TransformerBlocks areal/api/cli_args.py436
pp_schedulestring"Interleaved1F1B"Pipeline parallel schedule areal/api/cli_args.py467-472
pp_layers_per_stageinteger | NoneNoneNumber of transformer layers per virtual pipeline stage areal/api/cli_args.py475

Sources: areal/api/cli_args.py420-566

Archon Pipeline Parallelism Flow


Sources: areal/experimental/engine/archon_engine.py183-187 areal/experimental/engine/archon_runner.py56


FP8 Training Configuration

FP8EngineConfig encapsulates FP8 (8-bit floating point) training parameters. Currently supported by the Megatron engine.

ParameterTypeDefaultDescription
modestring"e4m3"FP8 precision mode areal/api/cli_args.py590-593
recipestring"delayed"FP8 scaling recipe areal/api/cli_args.py596-601
parambooleanFalseKeep parameters in FP8 precision to save memory areal/api/cli_args.py612
direct_convertbooleanTrueUse direct FP8 conversion during weight updates areal/api/cli_args.py686

Sources: areal/api/cli_args.py587-689


Scheduling Configuration

Scheduling configurations control how training workers are allocated across the cluster.

SchedulingSpec

ParameterTypeDefaultDescription
cpuinteger8CPU cores required per GPU areal/api/cli_args.py802
gpuinteger0GPU units required areal/api/cli_args.py805
meminteger32RAM (GB) required per GPU areal/api/cli_args.py808
task_typestring"worker"Choices: "worker", "engine" areal/api/cli_args.py817-821

SchedulingStrategy

ParameterTypeDefaultDescription
typestring"separation"Choices: "separation", "colocation" areal/api/cli_args.py783-787
targetstring | NoneNoneRole to colocate with areal/api/cli_args.py790

Sources: areal/api/cli_args.py780-886

Scheduling Strategies Diagram


Sources: areal/api/cli_args.py780-796