Last indexed: 7 May 2026 (2e12c1)

Request and Response Types

This page documents the core data structures used for communication between inference engines and workflows in AReaL. These types define the contract for model inference requests and responses, supporting both text-only and vision-language models (VLM).

Scope: This page focuses on the request/response structures themselves—their fields, methods, and data formats. For information about how these structures are used in workflows, see Workflow and Rollout System. For HTTP-based communication protocols with remote inference servers, see SGLang Backend and vLLM Backend.

Overview

AReaL's inference system uses strongly-typed dataclasses to represent inference requests and responses. The primary types are:

Type	Purpose	Key Fields
`ModelRequest`	Inference request sent to engines	`input_ids`, `gconfig`, `image_data`
`ModelResponse`	Inference response from engines	`output_tokens`, `output_logprobs`, `stop_reason`
`WeightUpdateMeta`	Metadata for weight synchronization	`type`, `path`, `use_lora`, `version`
`HttpRequest`	HTTP request wrapper for remote servers	`endpoint`, `payload`, `method`
`HttpGenerationResult`	Parsed HTTP generation response	`output_tokens`, `output_logprobs`

These types are defined in areal/api/io_struct.py27-376 and serve as the foundation for all inference operations in rollout workflows.

Sources: areal/api/io_struct.py27-376

ModelRequest Structure

Request Flow Diagram

Sources: areal/api/io_struct.py28-59 areal/engine/sglang_remote.py40-41 areal/engine/vllm_remote.py41-42 areal/api/workflow_api.py14-18

Field Reference

The ModelRequest dataclass is defined at areal/api/io_struct.py28-59:

Field	Type	Default	Purpose
`rid`	`str`	UUID	Unique request identifier for tracking areal/api/io_struct.py29
`input_ids`	`list[int]`	`[]`	Tokenized input sequence areal/api/io_struct.py30
`gconfig`	`GenerationHyperparameters`	default instance	Sampling parameters areal/api/io_struct.py31-33
`metadata`	`dict[str, Any]`	`{}`	Arbitrary metadata passed through workflow areal/api/io_struct.py34
`tokenizer`	`PreTrainedTokenizerFast`	`None`	Tokenizer for encode/decode operations areal/api/io_struct.py36
`image_data`	`list[str]`	`[]`	Base64-encoded images (VLM only) areal/api/io_struct.py39
`processor`	`AutoProcessor`	`None`	Processor for multi-modal inputs areal/api/io_struct.py40
`vision_msg_vllm`	`list`	`None`	vLLM-specific vision message format areal/api/io_struct.py43

Generation Configuration: The gconfig field references GenerationHyperparameters which includes sampling parameters like temperature, top_p, n_samples, and max_new_tokens areal/api/cli_args.py17 These are typically configured in YAML files under the gconfig block examples/math/gsm8k_grpo_lora.yaml36-42

Vision-Language Model Support: The image_data and vision_msg_vllm fields enable multi-modal inference. SGLang uses image_data directly in its payload areal/engine/sglang_remote.py71 while vLLM constructs chat messages from vision_msg_vllm and image_data, automatically detecting MIME types areal/engine/vllm_remote.py74-91

Sources: areal/api/io_struct.py28-59 areal/engine/sglang_remote.py71 areal/engine/vllm_remote.py74-91 examples/math/gsm8k_grpo_lora.yaml36-42

Request Methods

copy()

Creates a deep copy of the request with independent field values areal/api/io_struct.py45-59:

This method is used when workflows need to create multiple similar requests (e.g., for sampling multiple responses with n_samples > 1).

Sources: areal/api/io_struct.py45-59

ModelResponse Structure

Response Flow Diagram

Sources: areal/api/io_struct.py63-131 areal/engine/sglang_remote.py91-127 areal/engine/vllm_remote.py98-127 areal/api/workflow_api.py14-18

Field Reference

The ModelResponse dataclass is defined at areal/api/io_struct.py63-131:

Field	Type	Default	Purpose
`input_tokens`	`list[int]`	`[]`	Echo of input token IDs areal/api/io_struct.py65
`output_tokens`	`list[int]`	`[]`	Generated output token IDs areal/api/io_struct.py66
`output_logprobs`	`list[float]`	`[]`	Log probabilities for each token areal/api/io_struct.py67
`output_versions`	`list[int]`	`[]`	Model version used for generation areal/api/io_struct.py68
`stop_reason`	`Literal`	`"stop"`	Termination reason areal/api/io_struct.py69
`tokenizer`	`PreTrainedTokenizerFast`	`None`	Tokenizer for decode operations areal/api/io_struct.py71
`input_images`	`list[ImageObject\|str]`	`[]`	Input images (VLM only) areal/api/io_struct.py74
`latency`	`float`	`inf`	Total request latency areal/api/io_struct.py78
`ttft`	`float`	`inf`	Time to first token areal/api/io_struct.py79
`itl`	`list[float]`	`[]`	Inter-token latencies areal/api/io_struct.py80
`routed_experts`	`np.ndarray`	`None`	MoE routing information areal/api/io_struct.py83

Version Tracking: The output_versions field tracks model weights versions, enabling analysis of weight staleness in asynchronous training where rollouts and training happen concurrently.

Stop Reason Types:

"length": Hit max_new_tokens limit areal/api/io_struct.py69
"stop": Generated EOS or stop token areal/api/io_struct.py69
"tool_calls": Tool call detected areal/api/io_struct.py69
"abort": Request aborted due to error areal/api/io_struct.py69

Sources: areal/api/io_struct.py63-83

Response Properties

input_len and output_len

Simple length accessors areal/api/io_struct.py85-91:

end_with_stop

Checks if the output ends with an EOS or PAD token areal/api/io_struct.py93-104:

output_tokens_without_stop

Returns output tokens with trailing EOS/PAD tokens removed areal/api/io_struct.py107-130 This is critical for preparing training data as stop tokens should typically not receive gradients during the RL update.

Sources: areal/api/io_struct.py85-131

Weight Update and Checkpoint Structures

WeightUpdateMeta

The WeightUpdateMeta structure defines how model weights are synchronized from training to inference engines areal/api/io_struct.py183-244

Field	Type	Purpose
`type`	`"disk" \| "xccl" \| "awex"`	Transfer mechanism areal/api/io_struct.py184
`path`	`str \| None`	Path for disk-based updates areal/api/io_struct.py185
`use_lora`	`bool`	Whether this is a LoRA update areal/api/io_struct.py193
`lora_name`	`str`	Adapter name areal/api/io_struct.py194
`version`	`int \| None`	Weight version index areal/api/io_struct.py201

The with_version method creates a copy of the metadata with a versioned path (e.g., weight_update_v1) areal/api/io_struct.py203-215

Sources: areal/api/io_struct.py183-215

ParamSpec

Describes a single model parameter for distributed weight updates areal/api/io_struct.py150-159

Field	Type	Purpose
`name`	`str`	Parameter name areal/api/io_struct.py151
`shape`	`tuple`	Tensor dimensions areal/api/io_struct.py152
`dtype`	`str`	Data type string areal/api/io_struct.py153

The size property calculates the byte size of the parameter based on its shape and data type areal/api/io_struct.py155-158

Sources: areal/api/io_struct.py150-159

HTTP Request Structures

HttpRequest and HttpGenerationResult

For remote inference servers accessed via HTTP, AReaL uses wrapper structures defined in areal/api/io_struct.py

Sources: areal/api/io_struct.py278-294 areal/engine/sglang_remote.py89 areal/engine/vllm_remote.py93-96

HttpRequest

Defined at areal/api/io_struct.py278-284:

Field	Type	Default	Purpose
`endpoint`	`str`	(required)	API endpoint path areal/api/io_struct.py279
`payload`	`dict[str, Any]`	(required)	Request payload areal/api/io_struct.py280
`method`	`str`	`"POST"`	HTTP method areal/api/io_struct.py281

HttpGenerationResult

Defined at areal/api/io_struct.py287-294:

Field	Type	Default	Purpose
`output_tokens`	`list[int]`	(required)	Generated token IDs areal/api/io_struct.py288
`output_logprobs`	`list[float]`	(required)	Log probabilities areal/api/io_struct.py289
`stop_reason`	`str`	(required)	Termination reason areal/api/io_struct.py290
`routed_experts`	`np.ndarray`	`None`	MoE routing (optional) areal/api/io_struct.py291

Sources: areal/api/io_struct.py278-294

MoE Routing Information

For Mixture-of-Experts (MoE) models, ModelResponse.routed_experts contains routing decisions areal/api/io_struct.py83

In the SGLang backend, routing information is extracted from the meta_info["routed_experts"] field of the engine response, decoded from base64, and reshaped to (num_tokens, num_layers * expert_top_k) areal/engine/sglang_remote.py101-109

Sources: areal/api/io_struct.py83 areal/engine/sglang_remote.py101-109

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/11.1-request-and-response-types