VOOZH about

URL: https://huggingface.co/rkstgr/nanochat-d24-speedrun

⇱ rkstgr/nanochat-d24-speedrun · Hugging Face


nanochat-d24-speedrun

A 1.4B parameter GPT-2 style model trained from scratch using nanochat on 16×H100 GPUs.

Training

  • Architecture: 24-layer Transformer, 1536 hidden dim, 12 heads, 32K vocab
  • Training data: 5.8B tokens (ClimbMix), param:data ratio = 8
  • Precision: FP8 (tensorwise scaling)
  • Hardware: 16× NVIDIA H100
  • Throughput: 1.58M tok/sec, 47.6% bf16 MFU
  • Pretraining time: 62 minutes
  • Total pipeline time: 1h 26m (pretrain + base eval + SFT + chat eval + report)

Results

Metric Base SFT
Val BPB 0.715 -
CORE 0.247 -
ChatCORE - 0.360
ARC-Easy - 61.3%
ARC-Challenge - 48.9%
MMLU - 36.4%
HumanEval - 11.0%
GSM8K - 9.8%
SpellingBee - 99.6%

Files

  • base_checkpoints/ — Pretrained base model (step 5568)
  • chatsft_checkpoints/ — SFT fine-tuned chat model (step 482)
  • tokenizer/ — Custom BPE tokenizer (32K vocab)
  • report.md — Full training report

Usage

# Requires the nanochat repo
from nanochat.gpt import GPT, GPTConfig
from nanochat.checkpoint_manager import load_checkpoint

model, metadata = load_checkpoint("path/to/base_checkpoints")

Acknowledgments

Compute resources provided by WestAI. Thanks to the WestAI team for their generous compute contributions.

Downloads last month

-

Downloads are not tracked for this model. How to track

Dataset used to train rkstgr/nanochat-d24-speedrun