🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated • 6
smol_llama: 220M GQA
A small 220M param (total) decoder model. This is the first version of the model.
- 1024 hidden size, 10 layers
- GQA (32 heads, 8 key-value), context length 2048
- train-from-scratch on one GPU :)
Links
Here are some fine-tunes we did, but there are many more possibilities out there!
- instruct
- code
- python (pypi) - link
- zephyr DPO tune
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 29.44 |
| AI2 Reasoning Challenge (25-Shot) | 24.83 |
| HellaSwag (10-Shot) | 29.76 |
| MMLU (5-Shot) | 25.85 |
| TruthfulQA (0-shot) | 44.55 |
| Winogrande (5-shot) | 50.99 |
| GSM8k (5-shot) | 0.68 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 6.62 |
| IFEval (0-Shot) | 23.86 |
| BBH (3-Shot) | 3.04 |
| MATH Lvl 5 (4-Shot) | 0.00 |
| GPQA (0-shot) | 0.78 |
| MuSR (0-shot) | 9.07 |
| MMLU-PRO (5-shot) | 1.66 |
- Downloads last month
- 371
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Model tree for BEE-spoke-data/smol_llama-220M-GQA
Datasets used to train BEE-spoke-data/smol_llama-220M-GQA
Collection including BEE-spoke-data/smol_llama-220M-GQA
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard24.830
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard29.760
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard25.850
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard44.550
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard50.990
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard0.680
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard23.860
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard3.040
