VOOZH about

URL: https://huggingface.co/ray0rf1re/Nano-nano_v4.5

โ‡ฑ ray0rf1re/Nano-nano_v4.5 ยท Hugging Face


A newer version of this model is available: ray0rf1re/Nano-nano-4.6

๐Ÿง  Nano-nano v4.5

~255.7 M ยท LLaMA ยท Instruction-tuned ยท From scratch

๐Ÿ‘ License
๐Ÿ‘ Loss
๐Ÿ‘ Eval
๐Ÿ‘ Datasets

Successor to Nano-nano v4.
Same architecture family, ~8.5% larger, trained from scratch on 15 carefully weighted datasets.


๐Ÿ“‹ Quick Facts

Architecture LLaMA (decoder-only)
Parameters ~255.7 M
Context length 2 048 tokens
Vocabulary 50,264 tokens
Training loss 5.1763
Eval score 16.7%
Trained on 0.08 B tokens
Hardware NVIDIA GTX 1080 8 GB (Pascal)
Trained 2026-05-09 22:50

๐Ÿ—๏ธ Architecture

Standard LLaMA decoder-only transformer. Scaled ~8.5% wider + 1 extra layer vs v4.

Hyperparameter v4 v4.5
Parameters ~236 M ~255.7 M
hidden_size 896 896
intermediate_size 2 688 2 912
num_hidden_layers 14 15
num_attention_heads 14 14
num_key_value_heads 14 14
head_dim 64 64
vocab_size 50 264 50,264
max_position_embeddings 1 024 2 048
rms_norm_eps 1e-6 1e-6
rope_theta 10 000 10 000
hidden_act SiLU SiLU
tie_word_embeddings False False
attention_bias False False
mlp_bias False False

๐Ÿ“Š Evaluation

Automatically evaluated after training across 5 capability dimensions.

Category Hits Score
Knowledge 0/5 ๐Ÿ”ด 0%
Reasoning 0/4 ๐Ÿ”ด 0%
Hallucination 0/4 ๐Ÿ”ด 0%
Instruction 2/4 ๐ŸŸก 50%
Coherence 1/3 ๐Ÿ”ด 33%
Overall โ€” ๐Ÿ”ด 17%

Hallucination resistance โ€” whether the model appropriately declines questions about future events, fictional entities, or impossible premises rather than confabulating.

๐Ÿ‘ Category Scores
๐Ÿ‘ Hallucination
๐Ÿ‘ Training Curves


๐Ÿณ Training

Setting Value
Hardware GTX 1080 8 GB ยท Pascal ยท CUDA 6.1
Precision fp32 weights / fp16 AMP (GradScaler)
Optimizer StovetopCooker (HyperNix, pre-Volta)
LR 0.0001 cosine decay
Warmup 6% of steps
Embedding freeze First 15% of steps
Effective batch 8 ร— 2048 = 16,384 tokens/step
Steps 5092
Total tokens 0.08 B
Grad clipping 1.0
Grad checkpointing โœ…
Peak VRAM 5.34 GB
HyperNix โœ… freezer ยท StovetopCooker ยท old_fridge ยท new_fridge ยท smoke_alarm ยท pans ยท smoker

Dataset Mix

Dataset Samples Weight Category
Roman1111111/claude-opus-4.6-10000x 10 k 2.5ร— Claude conversations
WithinUsAI/GPT5.5_thinking_max_distill_god_seed_25K 25 k 2.0ร— Reasoning / thinking
HuggingFaceH4/MATH-500 500 2.0ร— Competition math
lighteval/MATH-Hard 10 k 2.0ร— Hard math
garage-bAInd/Open-Platypus 25 k 1.8ร— Reasoning instruction
iamtarun/python_code_instructions_18k_alpaca 8 k 1.6ร— Python code
b-mc2/sql-create-context 6 k 1.4ร— SQL code
nvidia/OpenCodeInstruct 30 k 1.5ร— Code instruction
teknium/OpenHermes-2.5 30 k 1.5ร— General instruction
Amod/mental_health_counseling_conversations 5 k 1.2ร— Chat / counseling
ray0rf1re/FineWeb-Nano 50 k 1.0ร— Web text
tonytins/chat-dataset 10 k 1.0ร— Conversation
databricks/databricks-dolly-15k 15 k 1.0ร— Instruction following
mlabonne/guanaco-llama2-1k 1 k 1.0ร— General QA
ray0rf1re/hyper-pip 20 k 2.0ร— HyperNix pip data
HuggingFaceH4/ultrachat_200k 30 k 1.5ร— Multi-turn chat
fka/awesome-chatgpt-prompts 5 k 0.8ร— Prompt engineering

๐Ÿš€ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
 "ray0rf1re/Nano-nano_v4.5",
 torch_dtype="auto",
 device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ray0rf1re/Nano-nano_v4.5")

def generate(prompt: str, max_new_tokens: int = 256) -> str:
 text = f"### Instruction:
{prompt}

### Response:
"
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
 out = model.generate(
 **inputs,
 max_new_tokens = max_new_tokens,
 do_sample = True,
 temperature = 0.7,
 top_p = 0.9,
 repetition_penalty = 1.1,
 pad_token_id = tokenizer.eos_token_id,
 )
 new_ids = out[0][inputs["input_ids"].shape[-1]:]
 return tokenizer.decode(new_ids, skip_special_tokens=True).strip()

# Examples
print(generate("Write a Python function to reverse a linked list."))
print(generate("What is the capital of France?"))
print(generate("Explain gradient descent in simple terms."))

โš ๏ธ Limitations

  • Context limited to 1 024 tokens โ€” unsuitable for long documents
  • Trained on 0.08 B tokens โ€” far less than production models
  • May hallucinate on obscure or out-of-distribution queries
  • Not RLHF/DPO aligned โ€” outputs may vary in safety and tone
  • Pascal GPU constraint (GTX 1080): fp32/fp16 only, no bf16

๐Ÿ“œ Citation

@misc{nano-nano-v45,
 author = {ray0rf1re},
 title = {Nano-nano v4.5: Compact LLaMA-Family Causal LM},
 year = {2026},
 publisher = {HuggingFace},
 howpublished = {https://huggingface.co/ray0rf1re/Nano-nano_v4.5},
}
Downloads last month
1,865
Safetensors
Model size
0.3B params
Tensor type
F32
ยท

Datasets used to train ray0rf1re/Nano-nano_v4.5

Collection including ray0rf1re/Nano-nano_v4.5

Evaluation results