Last indexed: 7 May 2026 (2e12c1)

Backend Protocol and Extensibility

Purpose and Scope

This document describes the backend protocol contract that enables AReaL to support multiple inference engines (SGLang, vLLM) through a unified abstraction layer. It explains the methods and data structures that backends must implement, how the backend composition pattern works, and how to add support for new inference backends.

For the high-level InferenceEngine API, see areal/api/engine_api.py312-457 For implementation details of specific backends, see areal/engine/sglang_remote.py40-245 and areal/engine/vllm_remote.py41-271

Backend Architecture Overview

AReaL's inference system uses a composition-based architecture that separates backend-specific protocol translation from core inference orchestration. This design enables support for multiple inference engines without duplicating common infrastructure code.

Composition Pattern

Diagram: Natural Language to Code Entity Mapping (Architecture)

Sources: areal/engine/sglang_remote.py40-41 areal/engine/vllm_remote.py41-42 areal/api/engine_api.py312-316

The architecture has three layers:

Layer	Responsibility	Examples
Engine Wrapper	Implements `InferenceEngine` API, delegates all calls to `RemoteInfEngine`	`RemoteSGLangEngine`, `RemotevLLMEngine`
Core Infrastructure	Manages server discovery, request dispatching, and async coordination	`RemoteInfEngine`
Backend Protocol	Translates AReaL requests to backend-specific HTTP payloads and responses	`SGLangBackend`, `VLLMBackend`

Sources: areal/engine/sglang_remote.py40-127 areal/engine/vllm_remote.py41-125

Backend Protocol Contract

A backend implementation must provide methods for three categories of operations: generation, weight updates, and server lifecycle management. All methods return data structures that RemoteInfEngine can use to make HTTP requests.

Core Data Structures

Backends work with these key data types:

Diagram: Data Flow Entities (Natural Language to Code)

Sources: areal/api/io_struct.py28-60 areal/api/io_struct.py183-220

Required Backend Methods

Generation Protocol

Backends must implement methods to build generation requests and parse responses.

Method: build_generation_request(req: ModelRequest, with_lora: bool, version: int) -> HttpRequest

Constructs backend-specific HTTP request payload from AReaL's unified ModelRequest format.

Backend	Endpoint	Payload Structure	Key Differences
SGLang	`/generate`	`{input_ids, sampling_params: {...}}`	Nested `sampling_params` areal/engine/sglang_remote.py56-89
vLLM	`/v1/completions` or `/v1/chat/completions`	`{prompt, top_p, top_k, ...}`	Flat structure, supports vision chat areal/engine/vllm_remote.py51-94

Method: parse_generation_response(response: dict[str, Any]) -> HttpGenerationResult

Extracts tokens, log probabilities, and stop reason from backend-specific response format.

Backend	Response Format	Token Extraction
SGLang	`meta_info["output_token_logprobs"]`	`[x[1] for x in ...]` areal/engine/sglang_remote.py119-127
vLLM	`choices[0]["logprobs"]`	Parse `"token:123"` strings or content dict areal/engine/vllm_remote.py101-126

Weight Update Protocol

Backends must support weight update modes defined in WeightUpdateMeta.

Method: build_disk_weight_update_requests(meta: WeightUpdateMeta) -> WeightUpdateRequests

Constructs request(s) to update weights from a checkpoint file on shared storage.

SGLang: Uses /update_weights_from_disk or /load_lora_adapter areal/engine/sglang_remote.py129-159
vLLM: Uses /areal_update_weights or /v1/load_lora_adapter areal/engine/vllm_remote.py129-148

Method: build_distributed_weight_update_requests(meta: WeightUpdateMeta, param_specs: list[ParamSpec]) -> WeightUpdateRequests

Constructs request(s) to update weights via NCCL/XCCL broadcast from training engine.

Backend	Capability	Implementation Detail
SGLang	Full model only	Distributed LoRA is not supported areal/engine/sglang_remote.py169-173
vLLM	Full & LoRA	Two-step: set metadata then update areal/engine/vllm_remote.py155-194

Method: build_init_weights_group_request(addr: str, server_idx: int, meta: WeightUpdateMeta) -> HttpRequest

Initializes NCCL/XCCL process group for distributed weight updates areal/engine/sglang_remote.py189-205 areal/engine/vllm_remote.py196-212

Server Lifecycle Protocol

Backends must implement control methods for server management:

Method	SGLang Endpoint	vLLM Endpoint
`get_pause_request()`	`/pause_generation`	`/areal_pause_generation`
`get_resume_request()`	`/continue_generation`	`/areal_continue_generation`
`get_health_check_request()`	`/health` (GET)	`/health` (GET)
`get_offload_request()`	`/release_memory_occupation`	`/sleep` (POST)

Sources: areal/engine/sglang_remote.py207-232 areal/engine/vllm_remote.py214-251

Request/Response Flow

The following sequence shows how a generation request flows through the backend protocol:

Sources: areal/engine/sglang_remote.py89-92 areal/engine/sglang_remote.py122-127

Implementing a New Backend

To add support for a new inference engine, follow these steps:

Step 1: Create Backend Protocol Class

Create a new backend class that implements all required methods (Generation, Weight Update, and Lifecycle). Refer to SGLangBackend areal/engine/sglang_remote.py40-245 or VLLMBackend areal/engine/vllm_remote.py41-271 as templates.

Step 2: Create Engine Wrapper Class

Create a wrapper class that composes RemoteInfEngine with the new backend. This class must inherit from InferenceEngine areal/api/engine_api.py312

Sources: areal/api/engine_api.py312-316

Step 3: Handle Backend-Specific Features

Different backends have different capabilities:

Feature	SGLang	vLLM	Implementation Notes
LoRA over XCCL	❌ No	✅ Yes	Handle in `build_distributed_weight_update_requests` areal/engine/vllm_remote.py165-180
Multi-step Updates	❌ No	✅ Yes	Return list of requests in `WeightUpdateRequests` areal/engine/vllm_remote.py185-194
Vision Input	Base64 string	OpenAI messages	Handle in `build_generation_request` areal/engine/vllm_remote.py74-93

Sources: areal/engine/sglang_remote.py169-173 areal/engine/vllm_remote.py155-194

Key Design Principles

Protocol Translation Only: Backend classes only translate between AReaL's data structures and backend-specific HTTP payloads. They do not implement business logic.
Pure Composition: Engine wrappers delegate all functionality to RemoteInfEngine.
Stateless Backends: Backend classes maintain no state; all state is managed by RemoteInfEngine.
Flexible Request Chains: WeightUpdateRequests can contain multiple HttpRequest objects, enabling backends that require multi-step protocols (like vLLM's metadata + update pattern).

Sources: areal/engine/sglang_remote.py40-187 areal/engine/vllm_remote.py41-194

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/4.4-backend-protocol-and-extensibility