VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/12.4-cicd-workflows

⇱ CI/CD Workflows | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

CI/CD Workflows

This page documents AReaL's continuous integration and continuous deployment (CI/CD) infrastructure, including GitHub Actions workflows, automated testing, Docker image builds, and release management.

Overview

AReaL's CI/CD system comprises:

  • GitHub Actions workflows for automated testing, formatting checks, and Docker image builds.
  • GCP-based test infrastructure with GPU runners for integration tests.
  • Automated Docker image publishing to GitHub Container Registry (GHCR) for both development and official releases.
  • Git-based version tracking with metadata integration.

All workflows are triggered by pull requests, releases, or manual dispatch, ensuring code quality and reproducibility across development, testing, and production environments.

Sources: .github/workflows/test-areal.yml1-35 .github/workflows/build-docker-image.yml1-20

GitHub Actions Workflows

Workflow Trigger Matrix

Workflow FileTrigger EventsRunner TypePurpose
test-areal.ymlPR with safe-to-test label, workflow_dispatch, workflow_callGCP a2-highgpu-2g (2x A100)Unit and integration tests (SGLang/vLLM variants)
build-docker-image.ymlworkflow_dispatchGCP areal-docker-builder → ephemeral test runnersDevelopment Docker image build, test, and promotion to :dev
tag-release-image.ymlrelease (created), workflow_dispatchGCP areal-docker-builderBuild and tag official release images (e.g., v1.0.4-sglang)
deploy-docs.ymlPush to main/add_chinese_docubuntu-latestJupyter Book documentation build/deploy
install-test.ymlPR/Push (main) affecting pyproject.toml or areal/ubuntu-latest, macos-latestValidates package installation and core imports
stale-issues.ymlDaily schedule (0:00 UTC), workflow_dispatchubuntu-latestAutomatic issue/PR staleness management
runner-heartbeat.ymlWeekly schedule (every 7 days), workflow_dispatchGCP areal-docker-builderSelf-hosted runner health check
bake-gcp-image.ymlworkflow_dispatch, workflow_callGCP a2-highgpu-2gPre-pulls Docker images into a new GCP OS image

Sources: .github/workflows/test-areal.yml3-35 .github/workflows/build-docker-image.yml3-8 .github/workflows/tag-release-image.yml3-12 .github/workflows/deploy-docs.yml3-6 .github/workflows/install-test.yml3-23 .github/workflows/bake-gcp-image.yml3-31

CI/CD Pipeline Flow

The following diagram maps the Natural Language lifecycle of a code change to the specific GitHub Actions entities and GCP resources.

Natural Language Space to Code Entity Space: Development Pipeline


Sources: .github/workflows/test-areal.yml45-238 .github/workflows/build-docker-image.yml21-218 .github/workflows/tag-release-image.yml25-220 .github/workflows/install-test.yml28-182

Test Workflow Architecture

Overview

The test-areal.yml workflow provisions ephemeral GCP instances with GPU hardware. It supports a matrix of inference engine variants (sglang, vllm) and test types (unit, integration) to ensure compatibility across different backends.

Sources: .github/workflows/test-areal.yml46-75

GCP Runner Provisioning

The workflow uses a custom OS image defined in env.GCP_OS_IMAGE (e.g., areal-cicd-test-20260506-432).

Startup Script Behavior: The rendered startup script (.github/workflows/test-areal.yml124-189):

  1. Starts a Docker container named areal-cicd using the specified CONTAINER_IMAGE with --runtime=nvidia --gpus all.
  2. Configures a high shared memory size (--shm-size="58205394001.92b") required for distributed training.
  3. Downloads and installs the GitHub Actions runner binary inside the container (.github/workflows/test-areal.yml176-177).
  4. Registers the runner as ephemeral using a registration token fetched via github-script (.github/workflows/test-areal.yml180-185).

Resource Specifications:

  • Machine type: a2-highgpu-2g (2x NVIDIA A100 GPUs).
  • Disk size: 2000GB.
  • Max run duration: 1 hour (enforced by GCP --max-run-duration).

Sources: .github/workflows/test-areal.yml40-44 .github/workflows/test-areal.yml207-238

Package Installation Validation

The install-test.yml workflow ensures the package remains installable across environments without requiring GPU hardware for the initial check.

  • Basic Installation: Validates uv sync and core imports (TrainController, RolloutController, WorkflowExecutor, StalenessManager) on Ubuntu and macOS (.github/workflows/install-test.yml53-73).
  • CUDA Extras: Tests installation of sglang, vllm, and megatron extras on Linux. It specifically handles the conflict between sglang and vllm by swapping pyproject.toml and uv.lock files (.github/workflows/install-test.yml82-136).
  • Docker Validation: Runs areal/tools/validate_docker_installation.py within the areal-runtime container to verify the pre-installed environment (.github/workflows/install-test.yml137-182).

Sources: .github/workflows/install-test.yml1-182

Docker Build and Release

Development Build and Promotion

The build-docker-image.yml workflow manages the lifecycle of the areal-runtime image for development:

  1. Start Builder: Boots a persistent areal-docker-builder instance on GCP (.github/workflows/build-docker-image.yml22-69).
  2. Build: Images for both sglang and vllm variants are built using docker/build-push-action@v7 (.github/workflows/build-docker-image.yml135-154).
  3. Push Test: Images are pushed to GHCR with the :test tag.
  4. Trigger CI: The workflow calls test-areal.yml as a downstream job using the newly built :test images (.github/workflows/build-docker-image.yml164-183).
  5. Promote: If tests pass, the :test images are pulled, retagged as :dev, and pushed back to GHCR (.github/workflows/build-docker-image.yml184-210).
  6. Bake: Finally, it triggers bake-gcp-image.yml to create a new GCP OS image that has these :dev images pre-pulled (.github/workflows/build-docker-image.yml211-218).

Sources: .github/workflows/build-docker-image.yml114-218 .github/workflows/bake-gcp-image.yml108-124

Release Image Building

The tag-release-image.yml workflow is triggered by official releases. It extracts the version from pyproject.toml (.github/workflows/tag-release-image.yml133-145) and builds production-ready images for both sglang and vllm variants, tagging them with the release tag (e.g., v1.0.4-sglang) and updating the :latest tag (.github/workflows/tag-release-image.yml158-182).

Sources: .github/workflows/tag-release-image.yml1-220

Version Management

The areal/version.py module provides the single source of truth for versioning by combining package metadata with Git state.

  • Package Version: Fetched via importlib.metadata.version("areal") (areal/version.py9-15).
  • Git Metadata: The VersionInfo class executes shell commands to retrieve the current branch, short commit hash, and dirty state (areal/version.py61-88).
  • Full Version String: Combines these elements into a descriptive string, e.g., 0.1.0-a1b2c3d-dirty (areal/version.py43-58).

Sources: areal/version.py1-98

Maintenance Workflows

GCP Image Baking

The bake-gcp-image.yml workflow optimizes CI startup times by pre-pulling heavy Docker images into a GCP disk image.

Natural Language Space to Code Entity Space: Baking Workflow


Sources: .github/workflows/bake-gcp-image.yml73-132 .github/workflows/bake-gcp-image.yml211-238

Documentation Deployment

The deploy-docs.yml workflow builds the Jupyter Book using docs/build_all.sh and deploys the resulting HTML to GitHub Pages whenever changes are pushed to main (.github/workflows/deploy-docs.yml47-72).

Sources: .github/workflows/deploy-docs.yml1-72