Reverse Text Model Qwen3-0.6B
Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using prime-rl (RL Training) and reverse-text (RL Environment). See the improvement in results:
Comparison with SFT (base) model
The reward (correctness score) distribution has improved for the RLFT model across all rollouts.
👁 Image
At an instance level, if we compare the best scores across rollouts, we see a mean improvement of 3.73%. But a maximum of ~30% and reduction of ~3%
👁 Image
Example Prompt & Reward
Task: reverse-text
Prompt:
- System:
“Reverse the text character-by-character. Put your answer in<reversed_text>tags.” - User:
“The community in Bruck was merged into it”
Expected Completion:
<reversed_text>
.ti otni degrem saw kcuBr ni ytinummoc ehT
</reversed_text>
Expected Reward: 0.963855421686747
Note: Reward is basd on the long common subsequence
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for sameersegal/Qwen3-0.6B-Reverse-Text-SFT-RLFT
Base model
Qwen/Qwen3-0.6B-Base Finetuned
PrimeIntellect/Qwen3-0.6B