VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/12.3-docker-images

⇱ Docker Images | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Docker Images

This document describes the Docker runtime images used by AReaL, including their multi-variant architecture, build process, CUDA package compilation, and usage in CI/CD workflows. The Docker images provide a pre-configured environment with all dependencies, optimized CUDA kernels, and GPU runtime capabilities for both training and inference.


Image Overview

AReaL provides Docker runtime images that package the complete execution environment. Unlike standard installations, the Docker images include pre-compiled C++ extensions and heavy CUDA dependencies that are often slow to build from source.

Key Components

  • Base OS: Ubuntu-based image with CUDA 12.9 support via the SGLang base Dockerfile12
  • Python Environment: Python 3.11+ managed via uv with bytecode compilation enabled Dockerfile36-163 pyproject.toml10
  • Inference Backends: Mutually exclusive variants for sglang or vllm Dockerfile1-15
  • Compiled Extensions: Pre-installed flash-attn (v2 & v3), apex, TransformerEngine, DeepEP, DeepGEMM, and FlashMLA Dockerfile83-158
  • System Libraries: InfiniBand/RDMA support (libibverbs, librdmacm) and distributed tools (kmod, cmake) Dockerfile22-30

Sources: Dockerfile1-163 pyproject.toml1-13 docs/en/tutorial/installation.md39-58


Multi-Variant Architecture

The AReaL Docker system uses a single Dockerfile to produce two distinct image variants. This is controlled via the VARIANT build argument, which determines the specific versions of PyTorch and inference backends installed. This architecture is necessary because sglang and vllm pin mutually-incompatible torch and torchao versions pyproject.vllm.toml3-5 areal/tools/check_pyproject_consistency.py28-46

Variant Comparison

Featuresglang Variant (Default)vllm Variant
Base Imagelmsysorg/sglang:v0.5.10.post1-runtimelmsysorg/sglang:v0.5.10.post1-runtime
PyTorch Version2.9.1+cu129 Dockerfile622.10.0+cu129 Dockerfile62
Primary Backendsglang pyproject.toml148vllm pyproject.vllm.toml161
Flash AttentionLinked against Torch 2.9 Dockerfile124Linked against Torch 2.10 Dockerfile124
Lock Fileuv.lock scripts/uv_lock.sh9uv.vllm.lock scripts/uv_lock.sh10

Natural Language to Code Entity Mapping: Build Variants

The following diagram shows how the VARIANT argument flows through the Dockerfile to configure the environment and how the build system selects dependencies.

Diagram: Build Variant Orchestration


Sources: Dockerfile1-15 Dockerfile61-64 Dockerfile121-128 pyproject.vllm.toml1-15 areal/tools/check_pyproject_consistency.py28-50


CUDA Package Compilation

A critical role of the Docker image is providing pre-compiled binaries for complex CUDA extensions. The build process is structured into stages to maximize cache efficiency.

Stage 1: Base Torch and Build Tools

The image first establishes the specific PyTorch version and installs build-essential tools like cmake, ccache, and uv Dockerfile22-64

Stage 2: Heavy C++ Dependencies

Packages that require long compilation times or specific hardware architectures (SM80, SM89, SM90) are installed before the main project code to prevent frequent recompilation Dockerfile48-81

  • NVIDIA Apex: Compiled with APEX_CUDA_EXT=1 for fused kernels Dockerfile84-86
  • Transformer Engine: For FP8 training support Dockerfile89-90
  • DeepSeek-V3 Kernels:
  • Flash Attention: Instead of compiling from source (which takes ~30 mins), the Dockerfile downloads, repacks, and installs specific pre-built wheels matching the Torch version (2.9 or 2.10) Dockerfile121-158
  • Causal Conv1d: Required for models like Qwen-3.5 Dockerfile115-118

Sources: Dockerfile22-158 docs/en/tutorial/installation.md126-162


Image Tags and Versioning

Images are hosted on GitHub Container Registry (GHCR) at ghcr.io/inclusionai/areal-runtime docs/en/tutorial/installation.md27

Tag Structure

Consistency Management

Since the project maintains two pyproject.toml files, a consistency checker areal/tools/check_pyproject_consistency.py ensures that non-inference dependencies remain identical across both variants areal/tools/check_pyproject_consistency.py6-9 The ESCAPABLE_PACKAGES list defines which packages (like torch, sglang, vllm) are permitted to differ areal/tools/check_pyproject_consistency.py32-46

Sources: docs/en/tutorial/installation.md27-58 areal/tools/check_pyproject_consistency.py1-50 pyproject.toml11


Runtime and Orchestration

System Mapping: Docker Runtime to Execution

The following diagram bridges the Docker container setup with the training environment configuration.

Diagram: Runtime Environment Mapping


Usage and Deployment

The Docker image provides the necessary environment to start ray clusters or local training jobs without additional installation steps. It is the recommended way to run AReaL to avoid local environment conflicts docs/en/tutorial/installation.md39-43


Sources: docs/en/tutorial/installation.md39-58 Dockerfile52-53 scripts/uv_lock.sh7-10