VOOZH about

URL: https://huggingface.co/Delta-Vector/Rei-12B-V3-Base

โ‡ฑ Delta-Vector/Rei-12B-V3-Base ยท Hugging Face


Rei-12B

Another prototype Magnum... (This time with Weird loss function(that ruins VRAM usage!!!)!)

๐Ÿ‘ Rei Model

โœจ Overview

A Model meant to replicate the style of Claude models Opus and Sonnet, Taking the previous Rei-12B and training it with a Custom Subseqence Loss function.

Fine-tuned on top of Mistral-Nemo-Instruct (ChatML'ified)

๐Ÿ“ฅ Quantized Models

๐Ÿ’ฌ Prompt Format

Rei-12B uses the ChatML format. A typical conversation should be structured as:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant

Recommended System Prompt

โš™๏ธ Training

Hparams

  • normal training cares about reducing overall error for the full context, but late context is easier to reduce and most tokens are not early tokensm, A mod to the loss function cares about reducing error for all context lengths, which leads to more emphasis on improving early context performance
  • You can find the modeling mod here: https://huggingface.co/datasets/Delta-Vector/Configs/blob/main/modeling_mistral.py

Configuration

The model was trained for 1 epochs on 8x NVIDIA H100s GPUs generously provided by @Kalomaze

โš ๏ธ Credits

I'd like to thank, Ruka/Sama twinkman | LucyKnada | Kubernetes Bad | PocketDoc | Tav | Trappu | Alicat | And the rest of Anthracite/Pygmalion for testing, feedback, and support.

Rei-12B | V3

Downloads last month
7
Safetensors
Model size
12B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Delta-Vector/Rei-12B-V3-Base

Finetuned
(4)
this model
Finetunes
2 models
Quantizations
2 models

Datasets used to train Delta-Vector/Rei-12B-V3-Base

Collection including Delta-Vector/Rei-12B-V3-Base