โจ Overview
Taking the previous 12B trained with Subseqence Loss - This model is meant to refine the base's sharp edges and increase coherency, intelligence and prose while replicating the prose of the Claude models Opus and Sonnet
Fine-tuned on top of Rei-V3-12B-Base, Rei-12B is designed to replicate the prose quality of Claude 3 models, particularly Sonnet and Opus, using a prototype Magnum V5 datamix.
๐ฅ Quantized Models
๐ฌ Prompt Format
Rei-12B uses the ChatML format. A typical conversation should be structured as:
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
Recommended System Prompt
โ๏ธ Training
Hparams
- For Hparams for this model we used a grad clip of 1e-4 as it was proven to the best value for Mistral-12B based models, and also to prevent Rewards/Chosen from flat-lining as Hermes-genned data is... The biggest piece of dogshit. ๐ Image
Configuration
The model was trained for 1 epochs on 8x NVIDIA H100s GPUs generously provided by @Kalomaze
โ ๏ธ Credits
I'd like to thank, Ruka/Sama twinkman | LucyKnada | Kubernetes Bad | PocketDoc | Tav | Trappu | Alicat | And the rest of Anthracite/Pygmalion for testing, feedback, and support.
Rei-12B | KTO
- Downloads last month
- 65
Model tree for Delta-Vector/Rei-V3-KTO-12B
Base model
mistralai/Mistral-Nemo-Base-2407