Voozh

Description

This repository contains the model for Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning.

Official Implementation

Citation

@article{kim2025meta,
 title={Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning},
 author={Kim, Yoonjeon and Jang, Doohyuk and Yang, Eunho},
 journal={arXiv preprint arXiv:2510.03259},
 year={2025}
}

Downloads last month: 194

Safetensors

Model size

8B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for jadohu/Qwen3-8B-GRPO

Base model

Qwen/Qwen3-8B-Base

Finetuned

(445)

this model

Quantizations

1 model

Dataset used to train jadohu/Qwen3-8B-GRPO

Collection including jadohu/Qwen3-8B-GRPO

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning • 7 items • Updated Nov 26, 2025 • 1

Paper for jadohu/Qwen3-8B-GRPO

Paper • 2510.03259 • Published Sep 26, 2025 • 57

URL: https://huggingface.co/jadohu/Qwen3-8B-GRPO

⇱ jadohu/Qwen3-8B-GRPO · Hugging Face

Description

Official Implementation

Citation

Model tree for jadohu/Qwen3-8B-GRPO

Dataset used to train jadohu/Qwen3-8B-GRPO

Collection including jadohu/Qwen3-8B-GRPO

Paper for jadohu/Qwen3-8B-GRPO