Moonlit-SummaryStories-45M
Moonlit-SummaryStories-45M is a 45M-parameter TinyStories model specialized for Summary → Story generation. It starts from the pretrained checkpoint of razor5050/TinyStories-45M and is then supervised fine-tuned to take a short summary prompt and generate a complete TinyStories-style story.
What this model does
Input format:
Summary: A little fox is afraid of the dark until a glowing jar helps him find his way home.
Story:
The model continues with a full short story.
Model details
- Architecture: LLaMA-style decoder-only transformer
- Parameters: 45.46M
- Hidden size: 512
- Layers: 13
- Attention heads: 8
- KV heads (GQA): 4
- Intermediate size: 1344
- Vocabulary size: 16384
- Context length: 512
- Tokenizer: SentencePiece unigram
Training recipe
Pretraining base
- Base model:
razor5050/TinyStories-45M - Original pretraining dataset:
roneneldan/TinyStories - Original pretraining epochs: 3
Fine-tuning task
- Dataset source:
roneneldan/TinyStoriesInstruct - Finetuning format:
Summary: ... Story:→ full story - Loss masking: prompt masked, loss only on story tokens
- No truncation policy: only samples that fully fit 512 total tokens were kept
- Usable SFT examples: 1702072
Exact usable dataset size under 512-token no-truncation rule
- Train: 1685116
- Validation: 16956
- Total: 1702072
Fine-tuning hyperparameters
- Epochs: 1
- Effective batch size: 64
- Micro-batch size: 8
- Learning rate: 8e-5
- Scheduler: cosine decay
- Precision: FP16
- Max sequence length: 512
Evaluation
- Validation loss: 1.238696612096066
- Perplexity: 3.4511122703552246
- Example generations: see
evaluation/40_prompts.json - Evaluation report: see
evaluation/report.md
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "razor5050/Moonlit-SummaryStories-45M"
model = AutoModelForCausalLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
prompt = "Summary: A shy rabbit learns to sing with the help of fireflies.
Story:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=220,
do_sample=True,
temperature=0.8,
top_p=0.95,
top_k=50,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Files in this repo
- Root: final finetuned model
checkpoints/base_pretrain_final/: pretrained base checkpoint used for finetuningcheckpoints/sft/: intermediate SFT checkpoints and final SFT exportevaluation/: metrics, prompt generations, and report
Hardware
- Training GPU: NVIDIA RTX 3060 12GB
- Intended deployment class: small creative story model
Notes
This model is optimized for TinyStories-style English story generation from a short summary prompt. Because the model context window is 512 tokens total, longer prompts reduce the available generation budget.
Generated on 2026-05-23 11:04:08
- Downloads last month
- 1,178
Safetensors
Model size
45.5M params
Tensor type
F32
·
Model tree for razor5050/Moonlit-SummaryStories-45M
Unable to build the model tree, the base model loops to the model itself. Learn more.
