gelu 261M (d=12) — seed 2
Reproducibility seed for the gelu 261M ablation
(seed 0 is the canonical published checkpoint). Same architecture, same data, same
hyper-params — only the random seed differs. Useful for variance estimation
when comparing architectures.
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained(
"mlnomad/gelu-d12-chinchilla-261M-seed2-pytorch",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
Apache 2.0.
- Downloads last month
- 1,591
Safetensors
Model size
0.3B params
Tensor type
F32
·
