ClimbMix Baseline Large 353M
Dense untied Llama baseline trained on 1.2B tokens from nvidia/Nemotron-ClimbMix.
Loading
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "atrost/climbmix-baseline-large-353m-1p2b-h100"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
trust_remote_code=True,
torch_dtype="auto",
)
trust_remote_code=True is required for the custom StairFormer/asymmetric checkpoints and harmless for the dense Llama baseline.
Source revision
This clean repo was split out from atrost/climbmix-llama-288m-2p8b-h100 at commit 1cb4607.
- Downloads last month
- 6
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
