Last indexed: 7 May 2026 (2e12c1)

Setup and Installation

This page covers the complete installation process for AReaL, including hardware prerequisites, environment setup using uv, Docker usage, NPU support, and environment validation. For conceptual information about AReaL's purpose and features, see 1.1. What is AReaL For step-by-step instructions on running your first training job, see 1.3. Quick Start Guide

Prerequisites

Hardware Requirements

The following configuration is optimized for large-scale reinforcement learning workloads using AReaL's asynchronous rollout architecture.

Component	Specification (NVIDIA)	Specification (Ascend NPU)
Accelerator	8x H800 per node docs/en/tutorial/installation.md9	16x NPU per node docs/en/tutorial/installation_npu.md9
CPU	64 cores per node docs/en/tutorial/installation.md10	64 cores per node docs/en/tutorial/installation_npu.md10
Memory	1TB per node docs/en/tutorial/installation.md11	1TB per node docs/en/tutorial/installation_npu.md11
Network	NVSwitch + RoCE 3.2 Tbps docs/en/tutorial/installation.md12	RoCE 3.2 Tbps docs/en/tutorial/installation_npu.md12
Storage (Local)	1TB for single-node docs/en/tutorial/installation.md14	1TB for single-node docs/en/tutorial/installation_npu.md14
Shared Storage	10TB NAS for distributed docs/en/tutorial/installation.md15	10TB NAS for distributed docs/en/tutorial/installation_npu.md15

Sources: docs/en/tutorial/installation.md5-15 docs/en/tutorial/installation_npu.md5-15

Software Requirements

Component	Version	Notes
Operating System	Ubuntu 22.04 / CentOS 7	Linux x86_64 is the primary target pyproject.toml35
Python	3.11 - 3.12	Enforced by `requires-python` pyproject.toml10
NVIDIA Driver	550.127.08	Tested version docs/en/tutorial/installation.md22
CUDA	12.8 / 12.9	Required for training/inference backends Dockerfile63
`uv`	0.9.18+	Required for dependency management pyproject.toml2

Sources: pyproject.toml2-10 Dockerfile9-63 docs/en/tutorial/installation.md19-27

Installation Decision Flow

The following diagram bridges high-level installation choices to specific tools and configurations used in the codebase.

Sources: Dockerfile1-13 pyproject.toml148-182 docs/en/tutorial/installation.md39-54

Option 1: Docker Runtime (Recommended)

The official Docker image includes pre-compiled CUDA packages and heavy C++ extensions (e.g., flash-attn, apex, TransformerEngine) which are slow to build from source.

Build Variants

AReaL supports two primary inference backends via the VARIANT build argument in the Dockerfile Dockerfile1-15:

SGLang (Default): Uses torch==2.9.1+cu129 Dockerfile62-64
vLLM: Uses torch==2.10.0+cu129 Dockerfile62-64

Launching the Container

The container requires high shared memory and GPU access for distributed training.

Sources: docs/en/tutorial/installation.md45-50 Dockerfile1-15

Option 2: Custom Environment with `uv`

For development or non-containerized deployments, use the uv package manager. uv handles complex dependency resolution and platform markers defined in pyproject.toml pyproject.toml220-228

Dependency Synchronization

Use uv sync with optional extras to install the correct backend:

uv sync --extra cuda: Installs cuda-train packages (Megatron, TMS) + sglang + flash-attn pyproject.toml176-182
vLLM Setup: Because sglang and vllm pin mutually-incompatible torch versions, vLLM requires swapping the project file pyproject.vllm.toml3-15:
Syncing Lockfiles: The scripts/uv_lock.sh script is used to generate and update both uv.lock and uv.vllm.lock to ensure consistency across variants scripts/uv_lock.sh1-51

Pre-commit Hooks

After environment setup, install hooks to ensure code quality:

Sources: pyproject.toml193-210 docs/en/tutorial/installation.md110-118 scripts/uv_lock.sh1-51

NPU Installation (Ascend)

NPU support requires specific images and CANN versions. Currently, the fsdp training engine and vllm rollout engine (via vLLM-Ascend) are supported docs/en/tutorial/installation_npu.md144-146

NPU Docker Setup

Use dedicated images for Ascend hardware (A2 or A3 variants):

Sources: docs/en/tutorial/installation_npu.md42-109

Package Structure and Build System

The following diagram maps the logical dependency groups to the pyproject.toml structure and the Docker build stages.

Manual C++ Extension Installation

If not using Docker, several optimized kernels must be built from source:

grouped_gemm: Required for MoE models Dockerfile80-81
NVIDIA apex: Fused Adam and mixed precision utilities Dockerfile84-86
TransformerEngine: FP8 training and optimized GEMMs Dockerfile89-90
DeepSeek-V3 Ops: FlashMLA, DeepGEMM, and DeepEP Dockerfile93-112

Sources: Dockerfile61-165 pyproject.toml148-182 uv.lock22-37

Installation Validation

Verify your installation using the provided scripts. These check for package imports, version compatibility, and accelerator availability.

Automated Checks

python3 areal/tools/validate_installation.py: Basic check for core dependencies.
areal/tools/check_pyproject_consistency.py: Ensures consistency between pyproject.toml and its vLLM variant. It identifies "escapable" packages like torch or vllm that are allowed to differ areal/tools/check_pyproject_consistency.py1-46

Hardware Check

Ensure your GPU is visible to PyTorch:

Sources: docs/en/tutorial/installation.md180-195 areal/tools/check_pyproject_consistency.py1-46

Versioning and Metadata

The system version and project metadata are managed through pyproject.toml.

Metadata Field	Code Reference	Description
`version`	`pyproject.toml:11`	Semantic version (e.g., `1.0.4`)
`requires-python`	`pyproject.toml:10`	Version constraint `>=3.11, <3.13`
`dependencies`	`pyproject.toml:43-139`	Core project requirements

Sources: pyproject.toml5-139

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/1.2-setup-and-installation