VOOZH about

URL: https://huggingface.co/shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic

⇱ shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic · Hugging Face


This is an llmcompressor v0.4.0 FP8 Dynamic quant.

You can refer to CPU offloading example but for quanting with an H100 node, we used this setup to avoid OOM errors:

config = AutoConfig.from_pretrained(model_name)
with init_empty_weights():
 model = AutoModelForCausalLM.from_config(config)

max_memory = {
 0: "60GiB",
 1: "60GiB",
 2: "60GiB",
 3: "60GiB",
 4: "60GiB",
 5: "60GiB",
 6: "60GiB",
 7: "60GiB",
 "cpu": "1500GiB",
}

device_map = infer_auto_device_map(
 model,
 max_memory=max_memory,
 no_split_module_classes=["LlamaDecoderLayer"],
)

Original model here: https://huggingface.co/allenai/Llama-3.1-Tulu-3-405B

Downloads last month
5
Safetensors
Model size
406B params
Tensor type
BF16
·
F8_E4M3
·

Model tree for shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic

Dataset used to train shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic