Last indexed: 7 May 2026 (2e12c1)

Datasets and Reward Functions

This page provides a technical deep dive into AReaL's built-in dataset support and the implementation of reward functions for Reinforcement Learning (RL), Supervised Fine-Tuning (SFT), and Direct Preference Optimization (DPO) training. AReaL supports a variety of modalities, including pure text (GSM8K), vision-language (Geometry3K, CLEVR, ViRL39K), and preference data (HH-RLHF).

Dataset Architecture

AReaL uses a centralized factory pattern to instantiate datasets based on the _DatasetConfig areal/api/cli_args.py5 provided in the experiment configuration. The primary entry point is get_custom_dataset areal/dataset/__init__.py165-206 which routes requests to specific loaders based on the dataset path and training type (SFT, RL, RW, or DPO).

Data Flow Overview

The following diagram illustrates how raw data from HuggingFace or local storage is transformed into model-ready tensors through the areal.dataset modules.

Dataset Transformation Pipeline

Sources: areal/dataset/__init__.py27-162 areal/dataset/__init__.py165-206 areal/dataset/gsm8k.py31-62

Built-in Datasets

AReaL supports a specific set of validated datasets defined in VALID_DATASETS areal/dataset/__init__.py15-22

Dataset Name	Modality	Supported Types	Loader Reference
`gsm8k`	Text (Math)	SFT, RL	`get_gsm8k_rl_dataset` areal/dataset/gsm8k.py31
`geometry3k`	Vision (Math)	SFT, RL	`get_geometry3k_rl_dataset` areal/dataset/__init__.py86
`clevr_count_70k`	Vision (Logic)	SFT, RL	`get_clevr_count_70k_rl_dataset` areal/dataset/__init__.py66
`virl39k`	Vision (Reasoning)	RL	`get_virl39k_rl_dataset` areal/dataset/__init__.py96
`hh-rlhf`	Text (Preference)	RW, DPO	`get_hhrlhf_dpo_dataset` areal/dataset/__init__.py116
`torl_data`	Text (Math)	RL	`get_torl_data_rl_dataset` areal/dataset/torl_data.py71

Sources: areal/dataset/__init__.py15-22 areal/dataset/__init__.py36-135 areal/dataset/torl_data.py71-100

Vision-Language Datasets (VLM)

For vision-language models, AReaL implements specialized preprocessing to handle image resizing and chat template application.

Geometry3K: Used for geometric reasoning. The loader get_geometry3k_rl_dataset areal/dataset/__init__.py86 (defined in areal/dataset/geometry3k.py) injects instructions for internal monologue and boxed answers.
CLEVR Count: Focuses on object counting. The loader get_clevr_count_70k_rl_dataset areal/dataset/__init__.py66 (defined in areal/dataset/clevr_count_70k.py) converts images and applies system prompts for bracketed answers.
ViRL39K: A vision reasoning dataset supported for RL training via get_virl39k_rl_dataset areal/dataset/__init__.py96-105

Sources: areal/dataset/__init__.py56-105 areal/dataset/geometry3k.py1-42

Text-Based Reasoning (Math & Logic)

GSM8K: The standard math word problem dataset. get_gsm8k_sft_dataset areal/dataset/gsm8k.py6-28 creates a loss_mask to train only on the completion, while get_gsm8k_rl_dataset areal/dataset/gsm8k.py31-62 formats questions into a message list with a final answer boxed prompt.
ToRL Data: A math dataset that requires a download step prepare_torl_data areal/dataset/torl_data.py43-60 It loads parquet files and wraps ground truth in \boxed{} areal/dataset/torl_data.py81-85

Sources: areal/dataset/gsm8k.py6-62 areal/dataset/torl_data.py21-100

Reward Function Implementation

Reward functions evaluate model completions against ground truth. AReaL provides built-in functions for math and vision tasks, accessible via get_custom_reward_fn areal/reward/__init__.py15-28

Math and Logic Verification

For reasoning tasks, AReaL uses regex-based extraction and specialized math verification logic.

CLEVR Count Reward: Uses extract_answer with regex \[([0-9\.]+)\] to find bracketed numbers areal/reward/clevr_count_70k.py10-15 The clevr_count_70k_reward_fn performs a string-based equality check areal/reward/clevr_count_70k.py18-32
Geometry3K Reward: Similar to CLEVR but utilizes the MathVerifyWorker areal/reward/geometry3k.py37-38
MathVerifyWorker: A thin wrapper over the math_verify library areal/reward/__init__.py31-110 It uses parse() and verify() to compare model outputs with ground truth, supporting LaTeX and numeric expressions with configurable precision areal/reward/__init__.py83-89 It uses a ThreadPoolExecutor to enforce thread-safe timeouts areal/reward/__init__.py95-97

Reward Execution Flow

Sources: areal/reward/__init__.py15-28 areal/reward/__init__.py91-110 areal/reward/clevr_count_70k.py18-32 areal/reward/geometry3k.py27-41

Preference Datasets (DPO and Reward Modeling)

For preference-based alignment, AReaL supports the hh-rlhf dataset via get_hhrlhf_rw_dataset areal/dataset/hhrlhf.py6-30 and get_hhrlhf_dpo_dataset areal/dataset/hhrlhf.py33-78

Reward Model (RW) Training: The loader prepares chosen_ids and rejected_ids areal/dataset/hhrlhf.py14-17 Training uses the Bradley-Terry loss examples/alignment/README.md23-26
DPO Training: The loader creates loss_mask fields for both chosen and rejected sequences, masking out the prompt areal/dataset/hhrlhf.py65-68 AReaL supports sigmoid and ipo loss types examples/alignment/README.md93-94 Reference log-probabilities are computed online via a colocated reference engine examples/alignment/README.md73-77

Sources: areal/dataset/hhrlhf.py6-78 examples/alignment/README.md15-109

Tool-Integrated Reasoning (TIR)

AReaL supports training agents that use tools (e.g., Python executors) during multi-turn reasoning.

TIRWorkflow: Manages the multi-turn interaction loop, detecting tool calls in model output and executing them via a ToolManager examples/tir/README_zh.md12-20
ToolManager: Coordinates tools like the PythonTool (executes code) and CalculatorTool (basic math) examples/tir/README_zh.md21-37
Mechanism: The workflow pauses generation when a tool call marker (e.g., <calculator>) is detected, executes the tool, appends the result to the conversation, and continues generation examples/tir/README_zh.md82-90
Data: TIR training often uses the torl_data dataset examples/tir/README_zh.md102-109

Sources: examples/tir/README_zh.md10-90 areal/dataset/torl_data.py71-100

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/10.5-datasets-and-reward-functions