VOOZH

URL: https://github.com/topics/rlvr

⇱ rlvr · GitHub Topics · GitHub

#

rlvr

Here are 46 public repositories matching this topic...

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

rlhf agentic rlvr

Updated
Python

👁 AgentsMeetRL

thinkwee / AgentsMeetRL

Awesome List for Agentic RL

agent awesome-list multiagent reinforcement llm rlhf large-language-model tool-learning agentic-workflow agentic-ai agentic-coding rlvr llm-age

Updated
HTML

pat-jj / s3

[EMNLP'25] s3 - ⚡ Efficient & Effective Search Agent Training via RL for RAG (RLVR for Search with Minimal Data)

information-retrieval efficiency verifier rag large-language-models search-agent gpt-5 agentic-ai rlvr

Updated
Python

thuml / RLVR-World

Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934

text-game video-generation robotic-manipulation video-prediction web-agent real2sim world-model webarena video-gpt grpo verl rlvr reinforcement-learning-with-verifiable-rewards

Updated
Python

InternLM / CapRL

[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

image-captioning multi-modal caption-generation llm vision-language-model large-vision-language-models grpo rlvr

Updated
Python

Tencent-Hunyuan / GradLoc

Implementation of GradLoc from the Tencent Hunyuan blog "Stabilizing RLVR via Token-level Gradient Diagnosis and Layerwise Clipping".

gradient llm hunyuan rlvr

Updated
Python

WooooDyy / BAPO

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.

rl reasoning llm rlvr

Updated
Python

tongjingqi / Awesome-Agent-RL

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

agent awesome reinforcement-learning rl awesome-list llm reward-model agentic-ai rlvr agent-training

Updated

teilomillet / retrain

a Python library that uses Reinforcement Learning (RL) to train LLMs.

mcp rl llm deepseek rlvr

Updated
Python

sileod / reasoning-core

Procedural symbolic reasoning data generators suite for synthetic pretraining

procedural-generation logic dataset procedural symbolic dataset-generation reasoning pre-training data-generators llm grpo verifiers rlvr pre-pre-training procedural-dataset solver-distillation

Updated
Python

osoleve / glitchlings

Enemies for your LLM

nlp linguistics adversarial-data-augmentation rlvr

Updated
Python

RUC-GSAI / YuLan-SwarmIntell

🐝 SwarmBench: Benchmarking LLMs' Swarm Intelligence

benchmark swarm swarm-intelligence kilobots swarm-robotics llms-benchmarking rlvr

Updated
Python

HKUST-KnowComp / Reasoning-Embedding

The official repository of the paper "Do Reasoning Models Enhance Embedding Models?"

representation-learning manifold embedding reasoning rlvr

Updated
Python

smiles724 / DeepSearch

This is the official code of DeepSearch [ICLR 2026]

llm reasoning-language-models rlvr

Updated
Python

ScalingIntelligence / kernelbench-tinker

Tinker ↔ KernelBench Integration enabling RL for GPU Kernel Generation

rl tinker rlvr rl-infra

Updated
Python

Qwen-Applications / CLIPO

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

reasoning contrastive-learning large-language-models rlvr

Updated
Python

zli12321 / free-form-grpo

grpo to train long form QA and instructions with long-form reward model

reinforcement-learning-algorithms evaluation-framework reward-design rl-training long-form-text-generation qwen2-5 grpo rlvr

Updated
Python

purbeshmitra / MOTIF

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

reinforcement-learning llm-training rlvr

Updated
Python

Miaow-Lab / RLVR-Linearity

[arXiv] "Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training"

llm-reasoning grpo rlvr

Updated
Python

bowen-upenn / PersonaMem-v2

PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory

personalization reinfocement-learning llm long-context personalized-generation llm-memory reinforcement-finetuning grpo verl agentic-memory rlvr

Updated
Python

Improve this page

Add a description, image, and links to the rlvr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlvr topic, visit your repo's landing page and select "manage topics."

You can’t perform that action at this time.