Arabic AI TinyStories GPT (12M parameters)
A small decoder-only GPT trained from scratch on TinyStories. Built line-by-line in PyTorch as part of the Build a GPT From Scratch on a 6 GB GPU YouTube playlist series — no transformers model classes.
Model description
| Property | Value |
|---|---|
| Parameters | ~12.4M |
| Layers | 6 |
| Heads | 6 |
| Embedding dim | 384 |
| Context length | 256 tokens |
| Vocabulary | 4,096 (custom BPE) |
| Training data | TinyStories |
| Training steps | ~25,000 |
| Validation loss | ~1.7 (varies by run) |
The model writes short children's stories in English. It was trained with a custom tokenizer (tinystories_tokenizer.json) and a hand-written GPT class in pure PyTorch.
Files in this repository
| File | Description |
|---|---|
model.pt |
Full training checkpoint (model, config, step, val_loss) |
config.json |
Architecture hyperparameters |
tinystories_tokenizer.json |
BPE tokenizer from Episode 2 |
model.py |
GPT and GPTConfig source (for loading locally) |
Usage
Install dependencies
pip install torch tokenizers huggingface_hub
Download and generate
import json
import torch
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer
REPO_ID = "luayas1977/arabicai-tinystories-gpt"
# Download artifacts
ckpt_path = hf_hub_download(repo_id=REPO_ID, filename="model.pt")
tok_path = hf_hub_download(repo_id=REPO_ID, filename="tinystories_tokenizer.json")
model_py = hf_hub_download(repo_id=REPO_ID, filename="model.py")
# Load architecture (download model.py into cwd or add to path)
import importlib.util
spec = importlib.util.spec_from_file_location("gpt_model", model_py)
gpt_model = importlib.util.module_from_spec(spec)
spec.loader.exec_module(gpt_model)
GPT = gpt_model.GPT
device = "cuda" if torch.cuda.is_available() else "cpu"
checkpoint = torch.load(ckpt_path, map_location=device, weights_only=False)
model = GPT(checkpoint["config"]).to(device)
model.load_state_dict(checkpoint["model"])
model.eval()
tokenizer = Tokenizer.from_file(tok_path)
prompt = "Once upon a time"
prompt_ids = tokenizer.encode(prompt).ids
idx = torch.tensor([prompt_ids], dtype=torch.long, device=device)
eos_id = tokenizer.token_to_id("<|endoftext|>")
with torch.no_grad():
out = model.generate(idx, max_new_tokens=200, temperature=0.8, top_k=40, eos_token_id=eos_id)
text = tokenizer.decode(out[0].tolist())
print(text)
Try it in the browser
Arabic AI TinyStories GPT Playground — interactive Gradio demo with temperature, top-k, and max-tokens controls.
Model repo: luayas1977/arabicai-tinystories-gpt
Training details
- Optimizer: AdamW (lr 3e-4, weight decay 0.1)
- Schedule: Linear warmup (500 steps) + cosine decay
- Batch size: 32 sequences × 256 tokens
- Hardware: Consumer GPU (~6 GB VRAM)
- Runtime: ~2 hours for 25,000 steps
Limitations
- Small model trained only on TinyStories — not suitable for general knowledge, code, or adult topics
- English only
- Context limited to 256 tokens
- May repeat story openers or run past a natural ending if EOS was not learned strongly (depends on data prep)
Citation / series
Part of Build a GPT From Scratch on a 6 GB GPU — YouTube playlist.
- Downloads last month
- 2,619
