Last indexed: 7 May 2026 (2e12c1)

Generation Hyperparameters

Purpose and Scope

This page documents the GenerationHyperparameters configuration class, which controls text generation behavior during inference and rollout phases in AReaL. These hyperparameters determine how the model samples tokens, when to stop generation, and how to handle special tokens.

For information about configuring inference engines themselves (SGLang, vLLM server settings), see Inference Engine Configurations. For overall configuration system concepts, see Configuration Overview.

Overview

The GenerationHyperparameters dataclass areal/api/cli_args.py163-211 encapsulates all parameters that control token generation during inference. It serves three primary purposes:

Training Rollouts: Controls generation during RL training data collection (e.g., in PPOActorConfig or GRPOActorConfig).
Evaluation: Separate hyperparameters can be specified via eval_gconfig for evaluation runs.
Agentic RL: Converts to OpenAI-compatible API formats for agent framework integration via the areal.api.io_struct.ModelRequest areal/api/io_struct.py28-60

The class is instantiated in experiment configurations (PPO, GRPO, SFT) as the gconfig parameter and optionally as eval_gconfig for evaluation-specific settings examples/math/gsm8k_grpo_lora.yaml36-43

Sources: areal/api/cli_args.py163-211 areal/api/io_struct.py28-60 examples/math/gsm8k_grpo_lora.yaml36-43

Class Structure

The following diagram illustrates the GenerationHyperparameters class and its primary methods for interacting with the rest of the system.

Code Entity Space: Generation Configuration

Sources: areal/api/cli_args.py163-211

Core Parameters

Sampling Parameters

These parameters control the stochastic sampling process during token generation areal/api/cli_args.py181-196

Parameter	Type	Default	Description
`temperature`	float	1.0	Sampling temperature. Higher values (>1.0) increase diversity; lower values (<1.0) make outputs more deterministic areal/api/cli_args.py193-196
`top_p`	float	1.0	Nucleus sampling threshold. Only considers tokens in the top-p cumulative probability mass areal/api/cli_args.py185-188
`top_k`	int	100,000,000	Top-K sampling. Only considers the K highest probability tokens areal/api/cli_args.py189-192
`greedy`	bool	False	Use greedy decoding (always select highest probability token). Overrides temperature/top_p/top_k when enabled areal/api/cli_args.py181-184
`use_beam_search`	bool	False	Enable beam search in vLLM. When enabled, sampling parameters are automatically ignored areal/api/cli_args.py209-211

Sampling Interaction Rules:

When greedy=True: all sampling parameters are ignored, output is deterministic.
When use_beam_search=True: temperature, top_p, and top_k are ignored (vLLM-specific) areal/api/cli_args.py209-211
In SGLangBackend, if greedy=True, temperature is forced to 0.0 in the payload areal/engine/sglang_remote.py60

Sources: areal/api/cli_args.py181-196 areal/api/cli_args.py209-211 areal/engine/sglang_remote.py56-65

Length Control

Parameters that control the number of tokens generated areal/api/cli_args.py166-180

Parameter	Type	Default	Description
`n_samples`	int	1	Number of sequences to generate per prompt. Used for over-generation in RL algorithms like GRPO areal/api/cli_args.py166-168
`max_new_tokens`	int	16384	Maximum number of NEW tokens to generate (excluding prompt) areal/api/cli_args.py169-171
`min_new_tokens`	int	0	Minimum number of tokens that must be generated before stopping is allowed areal/api/cli_args.py172-174
`max_tokens`	int	32768	Maximum total sequence length including prompt and generated tokens areal/api/cli_args.py175-180

Length Limit Precedence:

Generation stops when max_new_tokens is reached.
OR when max_tokens total length is reached.
OR when a stop condition is met (if min_new_tokens is satisfied).

Sources: areal/api/cli_args.py166-180

Stop Conditions

Parameters that determine when generation should terminate areal/api/cli_args.py197-211

Parameter	Type	Default	Description
`stop_token_ids`	list[int]	[]	Token IDs that trigger generation stop when sampled areal/api/cli_args.py197-200
`stop`	list[str] \| None	None	String sequences that trigger stop when sampled areal/api/cli_args.py210-211
`ignore_eos`	bool	False	When True, generation continues even when EOS token is sampled areal/api/cli_args.py201-204

Helper Method: The new_with_stop_and_pad_token_ids(tokenizer) method areal/api/cli_args.py223-232 automatically adds the tokenizer's pad_token_id and eos_token_id to stop_token_ids unless ignore_eos=True.

Sources: areal/api/cli_args.py197-211 areal/api/cli_args.py223-232

Additional Parameters

Parameter	Type	Default	Description
`frequency_penalty`	float	0.0	Penalizes tokens based on their frequency in the sequence areal/api/cli_args.py210-211
`skip_special_tokens`	bool	True	Skip special tokens when decoding/displaying outputs areal/api/cli_args.py205-208
`lora_name`	str	"default_lora"	LoRA adapter name to use for this generation request areal/api/cli_args.py210-211

Sources: areal/api/cli_args.py205-211

Configuration Usage

Programmatic Usage

The new method areal/api/cli_args.py213-221 is used to create updated configurations while maintaining compatibility with the configuration system.

Sources: areal/api/cli_args.py213-221

System Integration Flow

The following diagram maps how the GenerationHyperparameters (Natural Language Space config) flow into the distributed InferenceEngine and TrainEngine implementations (Code Entity Space).

Natural Language Space to Code Entity Space Mapping

Sources: areal/api/cli_args.py163-211 areal/api/io_struct.py28-34 areal/api/workflow_api.py14-39 areal/engine/fsdp_engine.py87 areal/engine/megatron_engine.py84 areal/experimental/engine/archon_engine.py83

OpenAI API Format Conversion

The GenerationHyperparameters class provides methods to convert to OpenAI-compatible API formats areal/api/cli_args.py234-312 This is critical for agentic RL where external tools or frameworks expect standard OpenAI schemas.

Conversion Methods

Method	Target API	Primary Key for Tokens
`to_openai_completions_args_dict()`	Chat Completions	`max_completion_tokens` areal/api/cli_args.py234-243
`to_openai_responses_args_dict()`	Responses API	`max_output_tokens` areal/api/cli_args.py245-254
`to_openai_agents_model_settings_dict()`	Agents Model Settings	`max_tokens` areal/api/cli_args.py256-265

The underlying to_openai_args_dict(api_format) method areal/api/cli_args.py267-312 handles the translation of AReaL parameters to their respective OpenAI equivalents.

Sources: areal/api/cli_args.py234-312

Unsupported Parameters and Warnings

When converting to OpenAI formats, certain AReaL-specific parameters cannot be represented. The system logs warnings areal/api/cli_args.py289-305 if the following are used:

min_new_tokens (OpenAI has no direct equivalent)
greedy (Should use temperature=0.0)
top_k (Not supported by OpenAI)
stop_token_ids (OpenAI uses string stop sequences)
ignore_eos
lora_name (Passed separately in AReaL)

Sources: areal/api/cli_args.py289-305

Integration with Inference Engines

The hyperparameters are utilized by the DistRolloutCoordinator to configure the remote inference backends via ModelRequest objects areal/api/io_struct.py28-34

Backend-Specific Implementation

SGLangBackend: Maps GenerationHyperparameters into a sampling_params dictionary areal/engine/sglang_remote.py56-65 It handles greedy by setting temperature to 0.0 areal/engine/sglang_remote.py60
VLLMBackend: Maps parameters to a flat payload structure areal/engine/vllm_remote.py52-64 It explicitly includes use_beam_search in the request payload areal/engine/vllm_remote.py62 and handles both /v1/completions and /v1/chat/completions endpoints areal/engine/vllm_remote.py93-96

Sources: areal/engine/sglang_remote.py40-128 areal/engine/vllm_remote.py41-127 areal/api/io_struct.py28-34

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/2.6-generation-hyperparameters