TinyStories-45M
A 45-million parameter language model trained entirely on the TinyStories dataset for creative story generation. This model follows the LLaMA architecture with grouped query attention (GQA) and is optimized for short-form narrative text.
Model Details
| Attribute | Value |
|---|---|
| Architecture | LLaMA-style (decoder-only transformer) |
| Parameters | 45.46M |
| Hidden Size | 512 |
| Layers | 13 |
| Attention Heads | 8 |
| KV Heads (GQA) | 4 |
| Intermediate Size | 1344 |
| Vocab Size | 16384 |
| Context Length | 512 |
| Tied Embeddings | Yes |
Training
Pretraining
- Dataset:
roneneldan/TinyStories - Epochs: 3
- Effective Batch Size: 128
- Learning Rate: 5e-4 with cosine decay
- Warmup: 1%
- Weight Decay: 0.1
- Precision: FP16
- Optimizer: AdamW
Supervised Fine-Tuning (SFT)
- Dataset:
roneneldan/TinyStoriesInstruct - Epochs: 1
- Learning Rate: 1e-4
- Loss Masking: Assistant-only (only compute loss on story completion)
Tokenizer
- Type: SentencePiece Unigram
- Vocab Size: 16,384
- Special Tokens:
<pad>,<eos>,<bos>,<unk>,<|im_end|>
Evaluation
| Metric | Value |
|---|---|
| Validation Loss | 0.829051066686119 |
| Perplexity | 2.2911436557769775 |
50-Prompt Inference
See evaluation/50_prompts.json for generated story samples.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("razor5050/TinyStories-45M")
tokenizer = AutoTokenizer.from_pretrained("razor5050/TinyStories-45M")
prompt = "Features: a brave cat\nWords: moon, adventure\nSummary: A cat goes on a moon adventure\nStory:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Hardware
- Training GPU: NVIDIA RTX 3060 12GB
- Training Time: ~8-10 hours (pretrain + SFT)
Citation
@dataset{roneneldan2023tinystories,
title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
author={Ronen Eldan and Yuanzhi Li},
year={2023}
}
Generated: 2026-05-20 18:37:02
- Downloads last month
- 627
Safetensors
Model size
45.5M params
Tensor type
F32
·
Model tree for razor5050/TinyStories-45M
Unable to build the model tree, the base model loops to the model itself. Learn more.
