ClimbMix Baseline Large 353M

Dense untied Llama baseline trained on 1.2B tokens from nvidia/Nemotron-ClimbMix.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "atrost/climbmix-baseline-large-353m-1p2b-h100"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
 repo_id,
 trust_remote_code=True,
 torch_dtype="auto",
)

trust_remote_code=True is required for the custom StairFormer/asymmetric checkpoints and harmless for the dense Llama baseline.

Source revision

This clean repo was split out from atrost/climbmix-llama-288m-2p8b-h100 at commit 1cb4607.

Downloads last month: 6

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

URL: https://huggingface.co/atrost/climbmix-baseline-large-353m-1p2b-h100

⇱ atrost/climbmix-baseline-large-353m-1p2b-h100 · Hugging Face

ClimbMix Baseline Large 353M

Loading

Source revision

Dataset used to train atrost/climbmix-baseline-large-353m-1p2b-h100