Last indexed: 7 May 2026 (2e12c1)

Examples and Tutorials

This page provides practical walkthroughs of AReaL's example scripts, demonstrating common training scenarios including mathematical reasoning, vision-language models, agentic RL, and distributed training.

For installation and setup, see page 1.2. For configuration reference, see page 2. For algorithm details, see page 7.

Overview

AReaL includes example scripts in the examples/ directory covering various training scenarios. Each example demonstrates different features of the system and can be adapted for custom use cases.

Available Examples

Task	Description	Key Features	Example Directory
Math	GSM8K problem solving	GRPO, PPO, DAPO, SFT	`examples/math/`
Multi-Turn Math	Iterative problem refinement	Multi-turn conversations, reward discounting	`examples/multi_turn_math/`
VLM Training	Vision-language reasoning	Multi-modal inputs, image processing	`examples/vlm/`
Countdown	Custom task with reward function	Custom workflows, reward engineering	`examples/countdown/`
TIR	Tool-integrated reasoning	Tool calling, multi-step reasoning	`examples/tir/`
Scaffolding	Modular RL composition	Controller/Worker patterns, TrajectoryMaker	`examples/scaffolding/`
Terminal Bench	CLI/Terminal agent training	Docker environments, CAMEL integration	`examples/terminal_bench/`
Search Agent	End-to-end search agent	Web browsing, LLM judge	`examples/search_agent/`
Tau2	Customer service agent	Multi-domain conversations, user simulation	`examples/tau2/`
Alignment	Reward model training	Preference learning, Bradley-Terry loss	`examples/alignment/`

Sources: examples/countdown/train.py1-101 examples/vlm/clevr_count_70k_grpo.py1-75 examples/math/boba_grpo.py1-91 examples/tir/train_tir.py1-67

Running Examples

All example scripts follow a consistent pattern:

Examples use the PPOTrainer class areal/trainer/ppo/ppo_trainer.py43-64 (for RL algorithms) or SFTTrainer areal/trainer/sft/sft_trainer.py18-25 (for supervised fine-tuning) which internally handles launcher selection and distributed execution.

Command Structure

Sources: examples/countdown/train.py80-97 examples/math/boba_grpo.py63-87 examples/tir/train_tir.py27-62 examples/vlm/clevr_count_70k_sft.py27-32

GSM8K Math Reasoning

The examples/math/ directory contains scripts for training mathematical reasoning models. For a deep dive, see GSM8K Math Reasoning.

GRPO Training

Group Relative Policy Optimization (GRPO) samples multiple trajectories per prompt to compute relative advantages. The gsm8k_grpo.yaml config defines n_samples for the group size and reward_norm levels examples/math/gsm8k_grpo.yaml36-78

The boba_grpo.py script utilizes RLVRWorkflow areal/workflow/rlvr.py24-42 to manage these group samples and uses get_math_verify_worker areal/reward/__init__.py1-10 for correctness checking.

Sources: examples/math/gsm8k_grpo.yaml36 examples/math/gsm8k_grpo.yaml75-78 examples/math/boba_grpo.py44-59 examples/math/boba_grpo.py77-87

Vision-Language Model Training

Training VLMs requires handling image inputs and specialized processors. The clevr_count_70k_grpo.py example demonstrates using VisionRLVRWorkflow areal/workflow/vision_rlvr.py24-40 with custom reward functions for visual counting tasks. For details, see Vision-Language Models.

VLM Training Flow

Sources: examples/vlm/clevr_count_70k_grpo.py34-41 examples/vlm/clevr_count_70k_grpo.py60-70

Agentic RL Workflows

AReaL supports complex agentic behaviors through proxy-based and direct integration patterns. For details, see Agent Workflows.

Tool-Integrated Reasoning (TIR)

The TIR example trains models to use tools (Python, Calculator) during multi-step reasoning using TIRWorkflow examples/tir/tir_workflow.py10-50 It uses a specialized TIRGRPOConfig examples/tir/tir_workflow.py10 for configuration. For details, see Tool-Integrated Reasoning (TIR).

Sources: examples/tir/train_tir.py10 examples/tir/train_tir.py44-62

Scaffolding Framework

The examples/scaffolding/ directory provides a modular framework for composing RL workflows using a Controller/Worker pattern. This allows for complex trajectory construction logic that is decoupled from the main training loop. For details, see Scaffolding Framework.

Custom Workflow Example: Countdown

The countdown example demonstrates implementing a custom RolloutWorkflow areal/api/workflow.py15-30 from scratch, manually handling ModelRequest areal/api/request.py10-25 and ModelResponse areal/api/response.py10-25 through the InferenceEngine interface areal/api/engine.py45-60

Countdown Workflow Implementation

Sources: examples/countdown/train.py20-67

Implementation Detail

The workflow must return a dictionary containing tensors for input_ids, loss_mask, logprobs, and rewards. In CountDownWorkflow, these are prepared after calling engine.agenerate examples/countdown/train.py42-67

Sources: examples/countdown/train.py58-67

Additional Tutorials

Multi-Node Training: Scaling to clusters with Ray/Slurm. See Multi-Node Training.
Custom Models: Integrating new architectures into ArchonEngine areal/engine/archon/archon_engine.py25-50 See Adding Custom Models to ArchonEngine.
Customer Service Agents: Training with Tau2-Bench. See Customer Service Agents (Tau2).
Reward Model Training: Preference learning with Bradley-Terry loss. See Reward Model Training.
OpenClaw and External Runtimes: Using the proxy gateway for human-in-the-loop and external agent training. See OpenClaw and External Agent Runtimes.
Search Agent and Deep Research: Training search agents with LLM judges. See Search Agent and Deep Research.
Terminal Bench: Training agents in Docker-based terminal environments. See Terminal Bench Agent Training.
Multi-turn Math: Training agents with conversation trees and discounting. See Multi-turn Math Training.
Evaluation Guide: Running distributed evaluation on models. See Evaluation Guide.

Refresh this wiki

URL: https://deepwiki.com/inclusionAI/AReaL/14-examples-and-tutorials