VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/10.5-datasets-and-reward-functions

⇱ Datasets and Reward Functions | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Datasets and Reward Functions

This page provides a technical deep dive into AReaL's built-in dataset support and the implementation of reward functions for Reinforcement Learning (RL), Supervised Fine-Tuning (SFT), and Direct Preference Optimization (DPO) training. AReaL supports a variety of modalities, including pure text (GSM8K), vision-language (Geometry3K, CLEVR, ViRL39K), and preference data (HH-RLHF).

Dataset Architecture

AReaL uses a centralized factory pattern to instantiate datasets based on the _DatasetConfig areal/api/cli_args.py5 provided in the experiment configuration. The primary entry point is get_custom_dataset areal/dataset/__init__.py165-206 which routes requests to specific loaders based on the dataset path and training type (SFT, RL, RW, or DPO).

Data Flow Overview

The following diagram illustrates how raw data from HuggingFace or local storage is transformed into model-ready tensors through the areal.dataset modules.

Dataset Transformation Pipeline


Sources: areal/dataset/__init__.py27-162 areal/dataset/__init__.py165-206 areal/dataset/gsm8k.py31-62

Built-in Datasets

AReaL supports a specific set of validated datasets defined in VALID_DATASETS areal/dataset/__init__.py15-22

Dataset NameModalitySupported TypesLoader Reference
gsm8kText (Math)SFT, RLget_gsm8k_rl_dataset areal/dataset/gsm8k.py31
geometry3kVision (Math)SFT, RLget_geometry3k_rl_dataset areal/dataset/__init__.py86
clevr_count_70kVision (Logic)SFT, RLget_clevr_count_70k_rl_dataset areal/dataset/__init__.py66
virl39kVision (Reasoning)RLget_virl39k_rl_dataset areal/dataset/__init__.py96
hh-rlhfText (Preference)RW, DPOget_hhrlhf_dpo_dataset areal/dataset/__init__.py116
torl_dataText (Math)RLget_torl_data_rl_dataset areal/dataset/torl_data.py71

Sources: areal/dataset/__init__.py15-22 areal/dataset/__init__.py36-135 areal/dataset/torl_data.py71-100

Vision-Language Datasets (VLM)

For vision-language models, AReaL implements specialized preprocessing to handle image resizing and chat template application.

  • Geometry3K: Used for geometric reasoning. The loader get_geometry3k_rl_dataset areal/dataset/__init__.py86 (defined in areal/dataset/geometry3k.py) injects instructions for internal monologue and boxed answers.
  • CLEVR Count: Focuses on object counting. The loader get_clevr_count_70k_rl_dataset areal/dataset/__init__.py66 (defined in areal/dataset/clevr_count_70k.py) converts images and applies system prompts for bracketed answers.
  • ViRL39K: A vision reasoning dataset supported for RL training via get_virl39k_rl_dataset areal/dataset/__init__.py96-105

Sources: areal/dataset/__init__.py56-105 areal/dataset/geometry3k.py1-42

Text-Based Reasoning (Math & Logic)

Sources: areal/dataset/gsm8k.py6-62 areal/dataset/torl_data.py21-100

Reward Function Implementation

Reward functions evaluate model completions against ground truth. AReaL provides built-in functions for math and vision tasks, accessible via get_custom_reward_fn areal/reward/__init__.py15-28

Math and Logic Verification

For reasoning tasks, AReaL uses regex-based extraction and specialized math verification logic.

Reward Execution Flow


Sources: areal/reward/__init__.py15-28 areal/reward/__init__.py91-110 areal/reward/clevr_count_70k.py18-32 areal/reward/geometry3k.py27-41

Preference Datasets (DPO and Reward Modeling)

For preference-based alignment, AReaL supports the hh-rlhf dataset via get_hhrlhf_rw_dataset areal/dataset/hhrlhf.py6-30 and get_hhrlhf_dpo_dataset areal/dataset/hhrlhf.py33-78

Sources: areal/dataset/hhrlhf.py6-78 examples/alignment/README.md15-109

Tool-Integrated Reasoning (TIR)

AReaL supports training agents that use tools (e.g., Python executors) during multi-turn reasoning.

Sources: examples/tir/README_zh.md10-90 areal/dataset/torl_data.py71-100