Moonlit-SummaryStories-45M

Moonlit-SummaryStories-45M is a 45M-parameter TinyStories model specialized for Summary → Story generation. It starts from the pretrained checkpoint of razor5050/TinyStories-45M and is then supervised fine-tuned to take a short summary prompt and generate a complete TinyStories-style story.

What this model does

Input format:

Summary: A little fox is afraid of the dark until a glowing jar helps him find his way home.
Story:

The model continues with a full short story.

Model details

Architecture: LLaMA-style decoder-only transformer
Parameters: 45.46M
Hidden size: 512
Layers: 13
Attention heads: 8
KV heads (GQA): 4
Intermediate size: 1344
Vocabulary size: 16384
Context length: 512
Tokenizer: SentencePiece unigram

Training recipe

Pretraining base

Base model: razor5050/TinyStories-45M
Original pretraining dataset: roneneldan/TinyStories
Original pretraining epochs: 3

Fine-tuning task

Dataset source: roneneldan/TinyStoriesInstruct
Finetuning format: Summary: ... Story: → full story
Loss masking: prompt masked, loss only on story tokens
No truncation policy: only samples that fully fit 512 total tokens were kept
Usable SFT examples: 1702072

Exact usable dataset size under 512-token no-truncation rule

Train: 1685116
Validation: 16956
Total: 1702072

Fine-tuning hyperparameters

Epochs: 1
Effective batch size: 64
Micro-batch size: 8
Learning rate: 8e-5
Scheduler: cosine decay
Precision: FP16
Max sequence length: 512

Evaluation

Validation loss: 1.238696612096066
Perplexity: 3.4511122703552246
Example generations: see evaluation/40_prompts.json
Evaluation report: see evaluation/report.md

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "razor5050/Moonlit-SummaryStories-45M"
model = AutoModelForCausalLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

prompt = "Summary: A shy rabbit learns to sing with the help of fireflies.
Story:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
 **inputs,
 max_new_tokens=220,
 do_sample=True,
 temperature=0.8,
 top_p=0.95,
 top_k=50,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Files in this repo

Root: final finetuned model
checkpoints/base_pretrain_final/: pretrained base checkpoint used for finetuning
checkpoints/sft/: intermediate SFT checkpoints and final SFT export
evaluation/: metrics, prompt generations, and report

Hardware

Training GPU: NVIDIA RTX 3060 12GB
Intended deployment class: small creative story model

Notes

This model is optimized for TinyStories-style English story generation from a short summary prompt. Because the model context window is 512 tokens total, longer prompts reduce the available generation budget.

Generated on 2026-05-23 11:04:08

Downloads last month: 1,178

Safetensors

Model size

45.5M params

Tensor type

F32

Model tree for razor5050/Moonlit-SummaryStories-45M

Unable to build the model tree, the base model loops to the model itself. Learn more.

URL: https://huggingface.co/razor5050/Moonlit-SummaryStories-45M

⇱ razor5050/Moonlit-SummaryStories-45M · Hugging Face