Demystifying Reinforcement Learning in Agentic Reasoning

👁 Paper on arXiv
👁 Open-AgentRL on GitHub
👁 30K RL Dataset
👁 DemyAgent-4B Model

🎯 About This Repository

This repository contains the Qwen3-4B-RA-SFT model weights, a 4B-sized agentic reasoning model that is finetuned with our 3k Agentic SFT dataset, based on Qwen3-4B-Instruct-2507.

🌟 Introduction

In our work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal

🎯 Data Quality Matters: Real end-to-end trajectories and high-diversity datasets significantly outperform synthetic alternatives
⚡ Training Efficiency: Exploration-friendly techniques like reward clipping and entropy maintenance boost training efficiency
🧠 Reasoning Strategy: Deliberative reasoning with selective tool calls surpasses frequent invocation or verbose self-reasoning We contribute high-quality SFT and RL datasets, demonstrating that simple recipes enable even 4B models to outperform 32B models on the most challenging reasoning benchmarks.

📦 Resources

Type	Name	Link
📊 Dataset	3K Agentic SFT Data	🤗 HuggingFace
📊 Dataset	30K Agentic RL Data	🤗 HuggingFace
🤖 Model	Qwen2.5-7B-RA-SFT	🤗 HuggingFace
🤖 Model	Qwen3-4B-RA-SFT	🤗 HuggingFace
🤖 Model	DemyAgent-4B	🤗 HuggingFace

📝 Citation

@article{yu2025demystify,
 title={Demystifying Reinforcement Learning in Agentic Reasoning},
 author={Yu, Zhaochen and Yang, Ling and Zou, Jiaru and Yan, Shuicheng and Wang, Mengdi},
 journal={arXiv preprint arXiv:2510.11701},
 year={2025}
}

Downloads last month: 21

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Gen-Verse/Qwen3-4B-RA-SFT

Merges

34 models

Quantizations

2 models

Collection including Gen-Verse/Qwen3-4B-RA-SFT

RLAnything & DemyAgent: Open-Source RL for LLMs and Agentic Scenarios • 12 items • Updated Feb 3 • 7

Paper for Gen-Verse/Qwen3-4B-RA-SFT

Paper • 2510.11701 • Published Oct 13, 2025 • 33

URL: https://huggingface.co/Gen-Verse/Qwen3-4B-RA-SFT

⇱ Gen-Verse/Qwen3-4B-RA-SFT · Hugging Face

Demystifying Reinforcement Learning in Agentic Reasoning

👁 Paper on arXiv
👁 Open-AgentRL on GitHub
👁 30K RL Dataset
👁 DemyAgent-4B Model

🎯 About This Repository

🌟 Introduction

📦 Resources

📝 Citation

Model tree for Gen-Verse/Qwen3-4B-RA-SFT

Collection including Gen-Verse/Qwen3-4B-RA-SFT

Paper for Gen-Verse/Qwen3-4B-RA-SFT

URL: https://huggingface.co/Gen-Verse/Qwen3-4B-RA-SFT

⇱ Gen-Verse/Qwen3-4B-RA-SFT · Hugging Face

Demystifying Reinforcement Learning in Agentic Reasoning

👁 Paper on arXiv 👁 Open-AgentRL on GitHub 👁 30K RL Dataset 👁 DemyAgent-4B Model

🎯 About This Repository

🌟 Introduction

📦 Resources

📝 Citation

Model tree for Gen-Verse/Qwen3-4B-RA-SFT

Collection including Gen-Verse/Qwen3-4B-RA-SFT

Paper for Gen-Verse/Qwen3-4B-RA-SFT

👁 Paper on arXiv
👁 Open-AgentRL on GitHub
👁 30K RL Dataset
👁 DemyAgent-4B Model