Last indexed: 7 May 2026 (2e12c1)

Local Launcher

Purpose and Scope

The Local Launcher (implemented via LocalLauncher and LocalScheduler) provides single-node execution for AReaL training and inference workflows. It manages worker subprocesses on a single GPU node, handles dynamic port allocation, and performs round-robin GPU assignment. Unlike the multi-node Ray Launcher or SLURM Launcher, the LocalLauncher is designed for simplicity and direct process management via Python's subprocess module .

The system consists of two primary layers:

LocalLauncher: A low-level process manager that handles subprocess.Popen calls, log redirection, and signal propagation .
LocalScheduler: A high-level resource manager that implements the Scheduler API, using subprocesses to instantiate workers with specific environment variables and RPC servers .

Sources: , ,

Architecture

The LocalLauncher serves as the execution engine for local experiments, while the LocalScheduler provides the abstraction needed by the AReaL controllers (like TrainController or RolloutController) to manage worker lifecycles.

Class Structure and Relationships

Sources: , , ,

Natural Language to Code Entity Space: Local Execution

The following diagram maps high-level execution concepts to the specific classes and methods in the LocalLauncher and LocalScheduler implementation.

Sources: , , , ,

Implementation Details

Worker Lifecycle and GPU Allocation

The LocalScheduler and LocalLauncher manage the lifecycle of local processes with a focus on resource isolation within a single node.

GPU Discovery: LocalScheduler detects available GPUs via _detect_gpus(), which checks the platform-specific environment variable (e.g., CUDA_VISIBLE_DEVICES) or safely counts devices in /dev .
Round-Robin Assignment: GPUs are allocated to workers using a counter that wraps around the available device list .
Command Construction: LocalLauncher prepends environment variables and uses stdbuf -oL to ensure line-buffered output for real-time logging .
Process Monitoring: LocalScheduler performs health checks on workers by polling the underlying subprocess.Popen object and sending heartbeats via RPC .
Recursive Termination: Cleanup is handled by kill_process_tree, which ensures that when a worker is killed, all its child processes (such as inference servers or sidecars) are also cleaned up , .
Colocation and Forking: LocalScheduler supports fork_workers, allowing a new worker (like a proxy server) to be spawned from an existing worker's environment, facilitating colocation on the same node and resources .

Sources: , , ,

Inference Server Integration

The local launcher infrastructure supports specialized wrappers for inference backends like SGLang and vLLM. These wrappers handle the specific requirements of launching multi-GPU inference servers on a local node.

SGLangServerWrapper: Manages the launch of sglang servers, including finding free ports for distributed initialization and monitoring the server process via _monitor_server_processes .
vLLMServerWrapper: Similar to SGLang, but includes specific signal handlers (SIGTERM, SIGINT) and a _cleanup_all_servers method to ensure the vLLM process tree is terminated gracefully . It also enables VLLM_ALLOW_RUNTIME_LORA_UPDATING by default .

Sources: ,

Data Flow: Process Launch and RPC Initialization

The following sequence shows how the LocalScheduler prepares a worker, launches it, and waits for the RPC server to become ready.

Sources: , , ,

Shared Storage Validation

Even in local execution mode, AReaL enforces shared storage checks via validate_shared_path . This ensures that the fileroot and name_resolve_root are accessible across processes and compatible with distributed scaling. The utility identifies network filesystems such as nfs, lustre, ceph, and cloud provider solutions like alinas (Alibaba Cloud).

Sources: ,

Configuration Reference

The behavior of the local launcher and scheduler is influenced by the following parameters:

Parameter	Description	Source
`gpu_devices`	List of GPU indices available for allocation.
`log_dir`	Directory where worker stdout/stderr logs are stored.
`startup_timeout`	Max seconds to wait for a worker's RPC server to start.
`name_resolve_type`	Method for worker discovery (typically `nfs` for local).
`stdbuf -oL`	Forces line-buffered output in launched subprocesses.
`enable_tms_offload`	Enables Tensor Management System offloading.

Sources: , ,

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/9.4-local-launcher

⇱ Local Launcher | inclusionAI/AReaL | DeepWiki

Local Launcher

Purpose and Scope

Architecture

Class Structure and Relationships

Natural Language to Code Entity Space: Local Execution

Implementation Details

Worker Lifecycle and GPU Allocation

Inference Server Integration

Data Flow: Process Launch and RPC Initialization

Shared Storage Validation

Configuration Reference

On this page