VOOZH about

URL: https://huggingface.co/sameersegal/Qwen3-0.6B-Reverse-Text-SFT-RLFT

⇱ sameersegal/Qwen3-0.6B-Reverse-Text-SFT-RLFT · Hugging Face


Reverse Text Model Qwen3-0.6B

Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using prime-rl (RL Training) and reverse-text (RL Environment). See the improvement in results:

Comparison with SFT (base) model

The reward (correctness score) distribution has improved for the RLFT model across all rollouts. 👁 Image

At an instance level, if we compare the best scores across rollouts, we see a mean improvement of 3.73%. But a maximum of ~30% and reduction of ~3% 👁 Image

Example Prompt & Reward

Task: reverse-text

Prompt:

  • System:
    “Reverse the text character-by-character. Put your answer in <reversed_text> tags.”
  • User:
    “The community in Bruck was merged into it”

Expected Completion:

<reversed_text>
.ti otni degrem saw kcuBr ni ytinummoc ehT
</reversed_text>

Expected Reward: 0.963855421686747

Note: Reward is basd on the long common subsequence

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sameersegal/Qwen3-0.6B-Reverse-Text-SFT-RLFT

Dataset used to train sameersegal/Qwen3-0.6B-Reverse-Text-SFT-RLFT