VOOZH about

URL: https://huggingface.co/jadohu/Qwen3-8B-GRPO

⇱ jadohu/Qwen3-8B-GRPO · Hugging Face


Description

This repository contains the model for Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning.

Official Implementation

https://github.com/akatigre/MASA-RL

Citation

@article{kim2025meta,
 title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
 author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
 journal={arXiv preprint arXiv:2510.03259},
 year={2025}
}
Downloads last month
194
Safetensors
Model size
8B params
Tensor type
BF16
·
Video Preview
loading

Model tree for jadohu/Qwen3-8B-GRPO

Finetuned
(445)
this model
Quantizations
1 model

Dataset used to train jadohu/Qwen3-8B-GRPO

Collection including jadohu/Qwen3-8B-GRPO

Paper for jadohu/Qwen3-8B-GRPO