VOOZH about

URL: https://huggingface.co/mlnomad/gelu-d12-chinchilla-261M-seed2-pytorch

⇱ mlnomad/gelu-d12-chinchilla-261M-seed2-pytorch · Hugging Face


gelu 261M (d=12) — seed 2

Reproducibility seed for the gelu 261M ablation (seed 0 is the canonical published checkpoint). Same architecture, same data, same hyper-params — only the random seed differs. Useful for variance estimation when comparing architectures.

from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained(
 "mlnomad/gelu-d12-chinchilla-261M-seed2-pytorch",
 trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

Apache 2.0.

Downloads last month
1,591
Safetensors
Model size
0.3B params
Tensor type
F32
·

Dataset used to train mlnomad/gelu-d12-chinchilla-261M-seed2-pytorch