Rei-12B

Another prototype Magnum... (This time with Weird loss function(that ruins VRAM usage!!!)!)

✨ Overview

A Model meant to replicate the style of Claude models Opus and Sonnet, Taking the previous Rei-12B and training it with a Custom Subseqence Loss function.

Fine-tuned on top of Mistral-Nemo-Instruct (ChatML'ified)

📥 Quantized Models

GGUF Quant

💬 Prompt Format

Rei-12B uses the ChatML format. A typical conversation should be structured as:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant

Recommended System Prompt

⚙️ Training

Hparams

normal training cares about reducing overall error for the full context, but late context is easier to reduce and most tokens are not early tokensm, A mod to the loss function cares about reducing error for all context lengths, which leads to more emphasis on improving early context performance
You can find the modeling mod here: https://huggingface.co/datasets/Delta-Vector/Configs/blob/main/modeling_mistral.py

Configuration

The model was trained for 1 epochs on 8x NVIDIA H100s GPUs generously provided by @Kalomaze

👁 Built with Axolotl

⚠️ Credits

Rei-12B | V3

Downloads last month: 7

Safetensors

Model size

12B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Delta-Vector/Rei-12B-V3-Base

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

NewEden/MistralAI-Nemo-Instruct-ChatML

Finetuned

(4)

this model

Finetunes

2 models

Quantizations