VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/4.4-backend-protocol-and-extensibility

⇱ Backend Protocol and Extensibility | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Backend Protocol and Extensibility

Purpose and Scope

This document describes the backend protocol contract that enables AReaL to support multiple inference engines (SGLang, vLLM) through a unified abstraction layer. It explains the methods and data structures that backends must implement, how the backend composition pattern works, and how to add support for new inference backends.

For the high-level InferenceEngine API, see areal/api/engine_api.py312-457 For implementation details of specific backends, see areal/engine/sglang_remote.py40-245 and areal/engine/vllm_remote.py41-271

Backend Architecture Overview

AReaL's inference system uses a composition-based architecture that separates backend-specific protocol translation from core inference orchestration. This design enables support for multiple inference engines without duplicating common infrastructure code.

Composition Pattern

Diagram: Natural Language to Code Entity Mapping (Architecture)


Sources: areal/engine/sglang_remote.py40-41 areal/engine/vllm_remote.py41-42 areal/api/engine_api.py312-316

The architecture has three layers:

LayerResponsibilityExamples
Engine WrapperImplements InferenceEngine API, delegates all calls to RemoteInfEngineRemoteSGLangEngine, RemotevLLMEngine
Core InfrastructureManages server discovery, request dispatching, and async coordinationRemoteInfEngine
Backend ProtocolTranslates AReaL requests to backend-specific HTTP payloads and responsesSGLangBackend, VLLMBackend

Sources: areal/engine/sglang_remote.py40-127 areal/engine/vllm_remote.py41-125

Backend Protocol Contract

A backend implementation must provide methods for three categories of operations: generation, weight updates, and server lifecycle management. All methods return data structures that RemoteInfEngine can use to make HTTP requests.

Core Data Structures

Backends work with these key data types:

Diagram: Data Flow Entities (Natural Language to Code)


Sources: areal/api/io_struct.py28-60 areal/api/io_struct.py183-220

Required Backend Methods

Generation Protocol

Backends must implement methods to build generation requests and parse responses.

Method: build_generation_request(req: ModelRequest, with_lora: bool, version: int) -> HttpRequest

Constructs backend-specific HTTP request payload from AReaL's unified ModelRequest format.

BackendEndpointPayload StructureKey Differences
SGLang/generate{input_ids, sampling_params: {...}}Nested sampling_params areal/engine/sglang_remote.py56-89
vLLM/v1/completions or /v1/chat/completions{prompt, top_p, top_k, ...}Flat structure, supports vision chat areal/engine/vllm_remote.py51-94

Method: parse_generation_response(response: dict[str, Any]) -> HttpGenerationResult

Extracts tokens, log probabilities, and stop reason from backend-specific response format.

BackendResponse FormatToken Extraction
SGLangmeta_info["output_token_logprobs"][x[1] for x in ...] areal/engine/sglang_remote.py119-127
vLLMchoices[0]["logprobs"]Parse "token:123" strings or content dict areal/engine/vllm_remote.py101-126

Weight Update Protocol

Backends must support weight update modes defined in WeightUpdateMeta.

Method: build_disk_weight_update_requests(meta: WeightUpdateMeta) -> WeightUpdateRequests

Constructs request(s) to update weights from a checkpoint file on shared storage.

Method: build_distributed_weight_update_requests(meta: WeightUpdateMeta, param_specs: list[ParamSpec]) -> WeightUpdateRequests

Constructs request(s) to update weights via NCCL/XCCL broadcast from training engine.

BackendCapabilityImplementation Detail
SGLangFull model onlyDistributed LoRA is not supported areal/engine/sglang_remote.py169-173
vLLMFull & LoRATwo-step: set metadata then update areal/engine/vllm_remote.py155-194

Method: build_init_weights_group_request(addr: str, server_idx: int, meta: WeightUpdateMeta) -> HttpRequest

Initializes NCCL/XCCL process group for distributed weight updates areal/engine/sglang_remote.py189-205 areal/engine/vllm_remote.py196-212

Server Lifecycle Protocol

Backends must implement control methods for server management:

MethodSGLang EndpointvLLM Endpoint
get_pause_request()/pause_generation/areal_pause_generation
get_resume_request()/continue_generation/areal_continue_generation
get_health_check_request()/health (GET)/health (GET)
get_offload_request()/release_memory_occupation/sleep (POST)

Sources: areal/engine/sglang_remote.py207-232 areal/engine/vllm_remote.py214-251

Request/Response Flow

The following sequence shows how a generation request flows through the backend protocol:


Sources: areal/engine/sglang_remote.py89-92 areal/engine/sglang_remote.py122-127

Implementing a New Backend

To add support for a new inference engine, follow these steps:

Step 1: Create Backend Protocol Class

Create a new backend class that implements all required methods (Generation, Weight Update, and Lifecycle). Refer to SGLangBackend areal/engine/sglang_remote.py40-245 or VLLMBackend areal/engine/vllm_remote.py41-271 as templates.

Step 2: Create Engine Wrapper Class

Create a wrapper class that composes RemoteInfEngine with the new backend. This class must inherit from InferenceEngine areal/api/engine_api.py312


Sources: areal/api/engine_api.py312-316

Step 3: Handle Backend-Specific Features

Different backends have different capabilities:

FeatureSGLangvLLMImplementation Notes
LoRA over XCCL❌ No✅ YesHandle in build_distributed_weight_update_requests areal/engine/vllm_remote.py165-180
Multi-step Updates❌ No✅ YesReturn list of requests in WeightUpdateRequests areal/engine/vllm_remote.py185-194
Vision InputBase64 stringOpenAI messagesHandle in build_generation_request areal/engine/vllm_remote.py74-93

Sources: areal/engine/sglang_remote.py169-173 areal/engine/vllm_remote.py155-194

Key Design Principles

  1. Protocol Translation Only: Backend classes only translate between AReaL's data structures and backend-specific HTTP payloads. They do not implement business logic.
  2. Pure Composition: Engine wrappers delegate all functionality to RemoteInfEngine.
  3. Stateless Backends: Backend classes maintain no state; all state is managed by RemoteInfEngine.
  4. Flexible Request Chains: WeightUpdateRequests can contain multiple HttpRequest objects, enabling backends that require multi-step protocols (like vLLM's metadata + update pattern).

Sources: areal/engine/sglang_remote.py40-187 areal/engine/vllm_remote.py41-194