RLAnything & DemyAgent: Open-Source RL for LLMs and Agentic Scenarios โข 12 items โข Updated โข 7
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Demystifying Reinforcement Learning in Agentic Reasoning
๐ Paper on arXiv
๐ Open-AgentRL on GitHub
๐ 30K RL Dataset
๐ DemyAgent-4B Model
๐ฏ About This Repository
This repository contains the Qwen3-4B-RA-SFT model weights, a 4B-sized agentic reasoning model that is finetuned with our 3k Agentic SFT dataset, based on Qwen3-4B-Instruct-2507.
๐ Introduction
In our work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal
- ๐ฏ Data Quality Matters: Real end-to-end trajectories and high-diversity datasets significantly outperform synthetic alternatives
- โก Training Efficiency: Exploration-friendly techniques like reward clipping and entropy maintenance boost training efficiency
- ๐ง Reasoning Strategy: Deliberative reasoning with selective tool calls surpasses frequent invocation or verbose self-reasoning We contribute high-quality SFT and RL datasets, demonstrating that simple recipes enable even 4B models to outperform 32B models on the most challenging reasoning benchmarks.
๐ฆ Resources
| Type | Name | Link |
|---|---|---|
| ๐ Dataset | 3K Agentic SFT Data | ๐ค HuggingFace |
| ๐ Dataset | 30K Agentic RL Data | ๐ค HuggingFace |
| ๐ค Model | Qwen2.5-7B-RA-SFT | ๐ค HuggingFace |
| ๐ค Model | Qwen3-4B-RA-SFT | ๐ค HuggingFace |
| ๐ค Model | DemyAgent-4B | ๐ค HuggingFace |
๐ Citation
@article{yu2025demystify,
title={Demystifying Reinforcement Learning in Agentic Reasoning},
author={Yu, Zhaochen and Yang, Ling and Zou, Jiaru and Yan, Shuicheng and Wang, Mengdi},
journal={arXiv preprint arXiv:2510.11701},
year={2025}
}
- Downloads last month
- 21
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
