VOOZH about

URL: https://huggingface.co/allura-org/Teleut-7b

⇱ allura-org/Teleut-7b · Hugging Face


Teleut 7b

👁 image/png

A replication attempt of Tulu 3 on the Qwen 2.5 base models.

Evals (so far)

Teleut 7B (measured) Tülu 3 SFT 8B (reported) Qwen 2.5 7B Instruct (reported) Ministral 8B (reported) Mistral 7B v0.3 (reported)
BBH (3 shot, CoT) 64.4% 67.9% 21.7% 56.2% 47.0%NLL
GSM8K (8 shot, CoT) 78.5% 76.2% 83.8% 80.0% xx.x%
IFEval (prompt loose) 66.3% 72.8% 74.7% 56.4% 53.0%
MMLU (0 shot, CoT) 73.2% 65.9% 76.6% 68.5% 30.7%5-shot
MMLU Pro (0 shot, CoT) 48.3% 44.3% 56.3%Unknown 32.9%5-shot 30.7%5-shot
PopQA (15 shot) 18.9% 29.3% 18.1% 20.2% xx.x%
TruthfulQA 47.2% 46.8% 63.1% 55.5% xx.x%

Credits

Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
Another big thanks to AllenAI for publishing the Tülu 3 data and model series (as well as the paper and details on training), as well as Alibaba for training the original Qwen 2.5 base model series!

@article{lambert2024tulu3,
 title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
 author = {
 Nathan Lambert and 
 Jacob Morrison and 
 Valentina Pyatkin and 
 Shengyi Huang and 
 Hamish Ivison and 
 Faeze Brahman and 
 Lester James V. Miranda and 
 Alisa Liu and 
 Nouha Dziri and 
 Shane Lyu and 
 Yuling Gu and 
 Saumya Malik and 
 Victoria Graf and 
 Jena D. Hwang and 
 Jiangjiang Yang and
 Ronan Le Bras and
 Oyvind Tafjord and
 Chris Wilhelm and
 Luca Soldaini and 
 Noah A. Smith and 
 Yizhong Wang and 
 Pradeep Dasigi and 
 Hannaneh Hajishirzi
 },
 year = {2024},
 email = {tulu@allenai.org}
}

Training procedure

👁 Built with Axolotl

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3.5e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Use paged_ademamix_8bit and the args are: No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 370
  • num_epochs: 1

Framework versions

  • Transformers 4.46.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3

Configuration


Downloads last month
13
Safetensors
Model size
8B params
Tensor type
BF16
·

Model tree for allura-org/Teleut-7b

Base model

Qwen/Qwen2.5-7B
Finetuned
(887)
this model
Adapters
15 models
Finetunes
1 model
Merges
4 models
Quantizations
5 models

Dataset used to train allura-org/Teleut-7b