A small preview of what might become the first(or second?) stepping stone for Magnum v5 โข 6 items โข Updated โข 4
Rei-12B
Another prototype Magnum... (This time with Weird loss function(that ruins VRAM usage!!!)!)
๐ Rei Modelโจ Overview
A Model meant to replicate the style of Claude models Opus and Sonnet, Taking the previous Rei-12B and training it with a Custom Subseqence Loss function.
Fine-tuned on top of Mistral-Nemo-Instruct (ChatML'ified)
๐ฅ Quantized Models
๐ฌ Prompt Format
Rei-12B uses the ChatML format. A typical conversation should be structured as:
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
Recommended System Prompt
โ๏ธ Training
Hparams
- normal training cares about reducing overall error for the full context, but late context is easier to reduce and most tokens are not early tokensm, A mod to the loss function cares about reducing error for all context lengths, which leads to more emphasis on improving early context performance
- You can find the modeling mod here: https://huggingface.co/datasets/Delta-Vector/Configs/blob/main/modeling_mistral.py
Configuration
The model was trained for 1 epochs on 8x NVIDIA H100s GPUs generously provided by @Kalomaze
โ ๏ธ Credits
I'd like to thank, Ruka/Sama twinkman | LucyKnada | Kubernetes Bad | PocketDoc | Tav | Trappu | Alicat | And the rest of Anthracite/Pygmalion for testing, feedback, and support.
Rei-12B | V3
- Downloads last month
- 7
Safetensors
Model size
12B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Delta-Vector/Rei-12B-V3-Base
Base model
mistralai/Mistral-Nemo-Base-2407 Finetuned
NewEden/MistralAI-Nemo-Instruct-ChatML