![]() |
VOOZH | about |
The Local Launcher (implemented via LocalLauncher and LocalScheduler) provides single-node execution for AReaL training and inference workflows. It manages worker subprocesses on a single GPU node, handles dynamic port allocation, and performs round-robin GPU assignment. Unlike the multi-node Ray Launcher or SLURM Launcher, the LocalLauncher is designed for simplicity and direct process management via Python's subprocess module .
The system consists of two primary layers:
LocalLauncher: A low-level process manager that handles subprocess.Popen calls, log redirection, and signal propagation .LocalScheduler: A high-level resource manager that implements the Scheduler API, using subprocesses to instantiate workers with specific environment variables and RPC servers .Sources: , ,
The LocalLauncher serves as the execution engine for local experiments, while the LocalScheduler provides the abstraction needed by the AReaL controllers (like TrainController or RolloutController) to manage worker lifecycles.
Sources: , , ,
The following diagram maps high-level execution concepts to the specific classes and methods in the LocalLauncher and LocalScheduler implementation.
Sources: , , , ,
The LocalScheduler and LocalLauncher manage the lifecycle of local processes with a focus on resource isolation within a single node.
LocalScheduler detects available GPUs via _detect_gpus(), which checks the platform-specific environment variable (e.g., CUDA_VISIBLE_DEVICES) or safely counts devices in /dev .LocalLauncher prepends environment variables and uses stdbuf -oL to ensure line-buffered output for real-time logging .LocalScheduler performs health checks on workers by polling the underlying subprocess.Popen object and sending heartbeats via RPC .kill_process_tree, which ensures that when a worker is killed, all its child processes (such as inference servers or sidecars) are also cleaned up , .LocalScheduler supports fork_workers, allowing a new worker (like a proxy server) to be spawned from an existing worker's environment, facilitating colocation on the same node and resources .Sources: , , ,
The local launcher infrastructure supports specialized wrappers for inference backends like SGLang and vLLM. These wrappers handle the specific requirements of launching multi-GPU inference servers on a local node.
SGLangServerWrapper: Manages the launch of sglang servers, including finding free ports for distributed initialization and monitoring the server process via _monitor_server_processes .vLLMServerWrapper: Similar to SGLang, but includes specific signal handlers (SIGTERM, SIGINT) and a _cleanup_all_servers method to ensure the vLLM process tree is terminated gracefully . It also enables VLLM_ALLOW_RUNTIME_LORA_UPDATING by default .Sources: ,
The following sequence shows how the LocalScheduler prepares a worker, launches it, and waits for the RPC server to become ready.
Sources: , , ,
Even in local execution mode, AReaL enforces shared storage checks via validate_shared_path . This ensures that the fileroot and name_resolve_root are accessible across processes and compatible with distributed scaling. The utility identifies network filesystems such as nfs, lustre, ceph, and cloud provider solutions like alinas (Alibaba Cloud).
Sources: ,
The behavior of the local launcher and scheduler is influenced by the following parameters:
| Parameter | Description | Source |
|---|---|---|
gpu_devices | List of GPU indices available for allocation. | |
log_dir | Directory where worker stdout/stderr logs are stored. | |
startup_timeout | Max seconds to wait for a worker's RPC server to start. | |
name_resolve_type | Method for worker discovery (typically nfs for local). | |
stdbuf -oL | Forces line-buffered output in launched subprocesses. | |
enable_tms_offload | Enables Tensor Management System offloading. |
Sources: , ,
Refresh this wiki