Teleut 7b

A replication attempt of Tulu 3 on the Qwen 2.5 base models.

Evals (so far)

	Teleut 7B (measured)	Tülu 3 SFT 8B (reported)	Qwen 2.5 7B Instruct (reported)	Ministral 8B (reported)	Mistral 7B v0.3 (reported)
BBH (3 shot, CoT)	64.4%	67.9%	21.7%	56.2%	47.0%^NLL
GSM8K (8 shot, CoT)	78.5%	76.2%	83.8%	80.0%	xx.x%
IFEval (prompt loose)	66.3%	72.8%	74.7%	56.4%	53.0%
MMLU (0 shot, CoT)	73.2%	65.9%	76.6%	68.5%	30.7%^5-shot
MMLU Pro (0 shot, CoT)	48.3%	44.3%	56.3%^Unknown	32.9%^5-shot	30.7%^5-shot
PopQA (15 shot)	18.9%	29.3%	18.1%	20.2%	xx.x%
TruthfulQA	47.2%	46.8%	63.1%	55.5%	xx.x%

Credits

Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
Another big thanks to AllenAI for publishing the Tülu 3 data and model series (as well as the paper and details on training), as well as Alibaba for training the original Qwen 2.5 base model series!

@article{lambert2024tulu3,
 title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
 author = {
 Nathan Lambert and 
 Jacob Morrison and 
 Valentina Pyatkin and 
 Shengyi Huang and 
 Hamish Ivison and 
 Faeze Brahman and 
 Lester James V. Miranda and 
 Alisa Liu and 
 Nouha Dziri and 
 Shane Lyu and 
 Yuling Gu and 
 Saumya Malik and 
 Victoria Graf and 
 Jena D. Hwang and 
 Jiangjiang Yang and
 Ronan Le Bras and
 Oyvind Tafjord and
 Chris Wilhelm and
 Luca Soldaini and 
 Noah A. Smith and 
 Yizhong Wang and 
 Pradeep Dasigi and 
 Hannaneh Hajishirzi
 },
 year = {2024},
 email = {tulu@allenai.org}
}

Training procedure

👁 Built with Axolotl

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3.5e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Use paged_ademamix_8bit and the args are: No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 370
num_epochs: 1

Framework versions

Transformers 4.46.3
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

Configuration

Downloads last month: 13

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for allura-org/Teleut-7b

Base model

Qwen/Qwen2.5-7B

Finetuned

(887)

this model

Adapters

15 models

Finetunes

1 model

Merges

4 models

Quantizations

5 models

URL: https://huggingface.co/allura-org/Teleut-7b

⇱ allura-org/Teleut-7b · Hugging Face