VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/3.4-archonengine

⇱ ArchonEngine | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

ArchonEngine

Purpose and Scope

ArchonEngine is a custom torch-native training backend that implements the TrainEngine interface areal/experimental/engine/archon_engine.py147-148 It provides multi-dimensional parallelism (DP, TP, PP, CP, EP) using PyTorch's native distributed primitives (FSDP2, DTensor, DeviceMesh) without Megatron-Core dependencies. The engine is specifically optimized for Mixture-of-Experts (MoE) models and supports advanced pipeline schedules and FP8 training.

For other training backends, see FSDPEngine (page 3.2) and MegatronEngine (page 3.3).

Supported Parallelism:

Key Components:

Sources: areal/experimental/engine/archon_engine.py147-200 areal/experimental/engine/archon_runner.py30-53 areal/experimental/engine/archon_weight_sync.py30-46 areal/experimental/models/archon/qwen3_5/model/state_dict_adapter.py21-32


Architecture Overview

Component Architecture

Diagram: ArchonEngine Core Entities


Diagram: Model Implementation and Registry Mapping


Sources: areal/experimental/engine/archon_engine.py157-173 areal/experimental/engine/archon_runner.py56-124 areal/experimental/models/archon/qwen3/model/args.py18-54 areal/experimental/models/archon/qwen3_5/model/state_dict_adapter.py21-136


ForwardBackwardRunner

The ForwardBackwardRunner (areal/experimental/engine/archon_runner.py30) handles the micro-batch execution logic. It abstracts the differences between standard sequential execution and pipeline-parallel schedules.

SequentialRunner

Used when pipeline parallelism is disabled (pp=1) areal/experimental/engine/archon_runner.py56 It iterates through micro-batches, performing a forward pass and an optional backward pass for each. It includes specialized support for TreeAttentionMeta when tree training is active areal/experimental/engine/archon_runner.py81-107

PipelinedRunner

Used when pp > 1 areal/experimental/engine/archon_runner.py124 It leverages torch.distributed.pipelining to execute schedules.

Sources: areal/experimental/engine/archon_runner.py30-175 areal/experimental/engine/archon_engine.py18-22


Weight Synchronization

ArchonEngine supports high-performance weight synchronization between training and inference engines via the archon_weight_sync.py module.

XCCL Synchronization

update_weights_from_distributed (areal/experimental/engine/archon_weight_sync.py114) performs a live broadcast of weights over NCCL/XCCL.

  1. Initialization: init_weight_update_group sets up a dedicated TCP store and process group for the transfer areal/experimental/engine/archon_weight_sync.py49-92
  2. Buffering: Weights are collected into buckets (defined by weight_chunked_mem_mb) to optimize network throughput areal/experimental/engine/archon_weight_sync.py131-163
  3. DTensor Handling: _get_full_tensor (areal/experimental/engine/archon_weight_sync.py95) automatically handles DTensor by calling full_tensor() to gather sharded weights before broadcasting areal/experimental/engine/archon_weight_sync.py98-106
  4. Coordination: The training engine pauses inference generation via pause_generation(), performs the broadcast using _update_bucket_weights(), and then resumes generation areal/experimental/engine/archon_weight_sync.py127-172

Sources: areal/experimental/engine/archon_weight_sync.py49-210


Checkpointing and State Management

ArchonEngine utilizes PyTorch Distributed Checkpoint (DCP) for efficient sharded saving and loading.

DCPState

The DCPState class (areal/experimental/engine/archon_checkpoint.py86) wraps model parts and optimizers for DCP operations.

State Dict Adapters

The engine uses BaseStateDictAdapter (areal/experimental/models/archon/base.py57) to convert between HuggingFace and Archon internal formats. For example, the Qwen3_5StateDictAdapter handles:

Sources: areal/experimental/engine/archon_checkpoint.py86-166 areal/experimental/models/archon/base.py57-142 areal/experimental/models/archon/qwen3_5/model/state_dict_adapter.py21-136