Last indexed: 7 May 2026 (2e12c1)

Weight Synchronization

Weight synchronization is the mechanism by which training engines propagate updated model parameters to inference engines during online reinforcement learning. This process ensures that rollout generation uses the most recent policy, enabling asynchronous RL workflows where training and inference occur on separate GPU pools.

For information about how training engines manage their own checkpoints and persistence, see 3.7 Checkpointing and Recovery For details on inference engine lifecycle management, see 4.6 Server Lifecycle Management

Overview

In AReaL's asynchronous RL architecture, training engines and inference engines operate independently on different GPU pools. After each training step, the training engine must propagate its updated weights to inference engines so that subsequent rollouts use the latest policy. This process is called weight synchronization or weight update.

The system supports three primary update modes:

XCCL mode: GPU-to-GPU transfer using collective communication (NCCL/HCCL).
Disk mode: File system-based transfer through checkpoint save/load.
AWEX mode: An optimized asynchronous weight exchange protocol (Experimental).

Weight synchronization is controlled by the weight_update_mode configuration parameter and orchestrated through the WeightUpdateMeta dataclass, which encapsulates all metadata required for the update operation areal/api/io_struct.py167-185

Sources: areal/api/io_struct.py167-211 areal/api/engine_api.py175-184

Weight Update Modes

XCCL Mode (GPU-to-GPU)

XCCL mode performs direct GPU-to-GPU weight transfer using PyTorch's distributed communication primitives (primarily dist.broadcast). This approach is faster and more efficient than disk-based transfers but requires establishing a shared process group between training and inference engines.

Key characteristics:

Zero disk I/O overhead: Parameters move directly between GPU memories.
Process Group Coordination: Requires NCCL/XCCL process group initialization across training and inference nodes using init_custom_process_group areal/engine/vllm_ext/vllm_worker_extension.py23
Coordination: Orchestrated via nccl_master_address and nccl_master_port areal/api/io_struct.py172-173
Inference Implementation: vLLMBackend implements a two-step process: first setting metadata via /areal_set_update_weight_meta, then triggering the update via /areal_update_weights_xccl areal/engine/vllm_remote.py180-181
Worker Execution: The vLLM worker side performs a torch.distributed.broadcast from rank 0 of the update group to apply parameters areal/engine/vllm_ext/vllm_worker_extension.py152-157

Sources: areal/api/io_struct.py168-174 areal/engine/vllm_remote.py147-186 areal/engine/vllm_ext/vllm_worker_extension.py133-165

Disk Mode (Filesystem-based)

Disk mode saves updated weights to a shared filesystem location and notifies inference engines to reload from disk via HTTP endpoints.

Key characteristics:

Robustness: No process group coordination required; works across heterogeneous clusters or when network topology prevents direct NCCL connections.
Implementation: Training engines save state dicts (e.g., via torch.save or save_model_to_hf) to the path specified in WeightUpdateMeta areal/api/io_struct.py169
Inference Reload: SGLangBackend build requests for /update_weights_from_disk areal/engine/sglang_remote.py151-158 vLLMBackend uses /areal_update_weights for full model updates from disk areal/engine/vllm_remote.py141

Sources: areal/api/io_struct.py167-181 areal/engine/sglang_remote.py128-159 areal/engine/vllm_remote.py126-145

WeightUpdateMeta Structure

The WeightUpdateMeta dataclass carries all metadata required for a weight update operation. It is created by the training engine and passed to both the training and inference engines during synchronization.

Field	Type	Description
`type`	`"disk"`, `"xccl"`, `"awex"`	Weight update mode areal/api/io_struct.py168
`path`	`str \| None`	Filesystem path for disk mode areal/api/io_struct.py169
`nccl_group_name`	`str \| None`	Process group identifier for XCCL areal/api/io_struct.py174
`use_lora`	`bool`	Whether updating LoRA adapters instead of full model areal/api/io_struct.py177
`lora_name`	`str`	LoRA adapter identifier areal/api/io_struct.py178
`version`	`int \| None`	Monotonically increasing version number areal/api/io_struct.py185

Sources: areal/api/io_struct.py167-185

Training Engine Integration

Training engines implement the update_weights() and connect_engine() methods to manage weight synchronization with inference engines areal/api/engine_api.py175-194

Connection Establishment

Training engines initialize weight update groups. For example, the AwexMegatronAdapter uses init_weights_update_group to establish a shared communication channel for training ranks areal/experimental/weight_update/awex/megatron_adapter.py154-162

Code Entity Interaction: Connection

Sources: areal/api/engine_api.py185-194 areal/experimental/weight_update/awex/megatron_adapter.py128-162 areal/experimental/weight_update/nccl_group.py13-23

Weight Update Execution

The update_weights() method performs the actual weight transfer. In AwexMegatronAdapter, this involves building a transfer plan, preparing send operations via nccl_build_send_ops, and executing them with batch_send_recv areal/experimental/weight_update/awex/megatron_adapter.py173-179

Weight Update Logic Flow

Sources: areal/experimental/weight_update/awex/megatron_adapter.py163-180 areal/api/engine_api.py175-183

Version Management

Weight version tracking prevents training on stale rollouts and enables staleness-aware algorithms.

Version Tracking in Responses

Inference engines embed the current weight version in each generated response. The output_versions field in ModelResponse tracks the version of the model used to generate the tokens areal/api/io_struct.py68 Versioned LoRA names are generated using get_versioned_lora_name(lora_name, version) areal/api/io_struct.py161-163

Sources: areal/api/io_struct.py63-68 areal/api/io_struct.py161-163

LoRA Support

AReaL provides native support for LoRA weight synchronization, allowing efficient updates of adapter layers without reloading the base model.

LoRA Update Protocol

LoRA updates require specific fields in WeightUpdateMeta:

use_lora: Must be True areal/api/io_struct.py177
lora_name: The identifier for the adapter areal/api/io_struct.py178
peft_config: Contains adapter parameters like target_modules, r (rank), and lora_alpha areal/api/io_struct.py181

Sources: areal/api/io_struct.py177-181

Backend Implementation

vLLM: Supports XCCL-based LoRA updates via /areal_update_weight_lora_xccl areal/engine/vllm_ext/vllm_worker_extension.py167-172 and disk-based via /v1/load_lora_adapter areal/engine/vllm_remote.py134-138 The worker extension handles the internal vLLM LoRARequest lifecycle and inplace reloading areal/engine/vllm_ext/vllm_worker_extension.py71-82
SGLang: Supports disk-based LoRA via /load_lora_adapter areal/engine/sglang_remote.py139-144 SGLang fused parameters (like qkv_proj or gate_up_proj) are unfused by the AwexSGLangAdapter to match HuggingFace-style names used during training areal/experimental/weight_update/awex/sglang_adapter.py112-146
Megatron: Supports converting Megatron-format LoRA weights (e.g., for Qwen3 MoE) to HF format for inference engine consumption areal/engine/megatron_utils/megatron_lora.py37-121

Sources: areal/engine/vllm_remote.py130-186 areal/engine/sglang_remote.py132-172 areal/engine/vllm_ext/vllm_worker_extension.py58-131 areal/experimental/weight_update/awex/sglang_adapter.py112-160

Memory Management During Updates

Model Sharding (Megatron/TP)

When using Megatron-style Tensor Parallelism, the system must all-gather sharded parameters before synchronization to the inference engine. all_gather_param handles gathering sharded tensors along the partition dimension to reconstruct full weights areal/engine/megatron_utils/megatron.py95-152

Sources: areal/engine/megatron_utils/megatron.py95-152 areal/engine/megatron_utils/megatron.py26-40

FP8 Support

For high-performance training, AReaL supports FP8 weight synchronization. _all_gather_fp8_tensor_and_concat handles the collective communication of both the rowwise data and the rowwise scale inversions for Float8BlockwiseQTensor types areal/engine/megatron_utils/megatron.py63-91

Sources: areal/engine/megatron_utils/megatron.py63-91 areal/engine/megatron_utils/megatron.py105-110

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/3.6-weight-synchronization