VOOZH about

URL: https://deepwiki.com/inclusionAI/AReaL/14-examples-and-tutorials

⇱ Examples and Tutorials | inclusionAI/AReaL | DeepWiki


Loading...
Last indexed: 7 May 2026 (2e12c1)
Menu

Examples and Tutorials

This page provides practical walkthroughs of AReaL's example scripts, demonstrating common training scenarios including mathematical reasoning, vision-language models, agentic RL, and distributed training.

For installation and setup, see page 1.2. For configuration reference, see page 2. For algorithm details, see page 7.


Overview

AReaL includes example scripts in the examples/ directory covering various training scenarios. Each example demonstrates different features of the system and can be adapted for custom use cases.

Available Examples

TaskDescriptionKey FeaturesExample Directory
MathGSM8K problem solvingGRPO, PPO, DAPO, SFTexamples/math/
Multi-Turn MathIterative problem refinementMulti-turn conversations, reward discountingexamples/multi_turn_math/
VLM TrainingVision-language reasoningMulti-modal inputs, image processingexamples/vlm/
CountdownCustom task with reward functionCustom workflows, reward engineeringexamples/countdown/
TIRTool-integrated reasoningTool calling, multi-step reasoningexamples/tir/
ScaffoldingModular RL compositionController/Worker patterns, TrajectoryMakerexamples/scaffolding/
Terminal BenchCLI/Terminal agent trainingDocker environments, CAMEL integrationexamples/terminal_bench/
Search AgentEnd-to-end search agentWeb browsing, LLM judgeexamples/search_agent/
Tau2Customer service agentMulti-domain conversations, user simulationexamples/tau2/
AlignmentReward model trainingPreference learning, Bradley-Terry lossexamples/alignment/

Sources: examples/countdown/train.py1-101 examples/vlm/clevr_count_70k_grpo.py1-75 examples/math/boba_grpo.py1-91 examples/tir/train_tir.py1-67


Running Examples

All example scripts follow a consistent pattern:


Examples use the PPOTrainer class areal/trainer/ppo/ppo_trainer.py43-64 (for RL algorithms) or SFTTrainer areal/trainer/sft/sft_trainer.py18-25 (for supervised fine-tuning) which internally handles launcher selection and distributed execution.

Command Structure


Sources: examples/countdown/train.py80-97 examples/math/boba_grpo.py63-87 examples/tir/train_tir.py27-62 examples/vlm/clevr_count_70k_sft.py27-32


GSM8K Math Reasoning

The examples/math/ directory contains scripts for training mathematical reasoning models. For a deep dive, see GSM8K Math Reasoning.

GRPO Training

Group Relative Policy Optimization (GRPO) samples multiple trajectories per prompt to compute relative advantages. The gsm8k_grpo.yaml config defines n_samples for the group size and reward_norm levels examples/math/gsm8k_grpo.yaml36-78


The boba_grpo.py script utilizes RLVRWorkflow areal/workflow/rlvr.py24-42 to manage these group samples and uses get_math_verify_worker areal/reward/__init__.py1-10 for correctness checking.

Sources: examples/math/gsm8k_grpo.yaml36 examples/math/gsm8k_grpo.yaml75-78 examples/math/boba_grpo.py44-59 examples/math/boba_grpo.py77-87


Vision-Language Model Training

Training VLMs requires handling image inputs and specialized processors. The clevr_count_70k_grpo.py example demonstrates using VisionRLVRWorkflow areal/workflow/vision_rlvr.py24-40 with custom reward functions for visual counting tasks. For details, see Vision-Language Models.

VLM Training Flow


Sources: examples/vlm/clevr_count_70k_grpo.py34-41 examples/vlm/clevr_count_70k_grpo.py60-70


Agentic RL Workflows

AReaL supports complex agentic behaviors through proxy-based and direct integration patterns. For details, see Agent Workflows.

Tool-Integrated Reasoning (TIR)

The TIR example trains models to use tools (Python, Calculator) during multi-step reasoning using TIRWorkflow examples/tir/tir_workflow.py10-50 It uses a specialized TIRGRPOConfig examples/tir/tir_workflow.py10 for configuration. For details, see Tool-Integrated Reasoning (TIR).


Sources: examples/tir/train_tir.py10 examples/tir/train_tir.py44-62

Scaffolding Framework

The examples/scaffolding/ directory provides a modular framework for composing RL workflows using a Controller/Worker pattern. This allows for complex trajectory construction logic that is decoupled from the main training loop. For details, see Scaffolding Framework.


Custom Workflow Example: Countdown

The countdown example demonstrates implementing a custom RolloutWorkflow areal/api/workflow.py15-30 from scratch, manually handling ModelRequest areal/api/request.py10-25 and ModelResponse areal/api/response.py10-25 through the InferenceEngine interface areal/api/engine.py45-60

Countdown Workflow Implementation


Sources: examples/countdown/train.py20-67

Implementation Detail

The workflow must return a dictionary containing tensors for input_ids, loss_mask, logprobs, and rewards. In CountDownWorkflow, these are prepared after calling engine.agenerate examples/countdown/train.py42-67


Sources: examples/countdown/train.py58-67


Additional Tutorials