9 items โข Updated
A newer version of this model is available: ray0rf1re/Nano-nano-4.6
๐ง Nano-nano v4.5
~255.7 M ยท LLaMA ยท Instruction-tuned ยท From scratch
๐ License
๐ Loss
๐ Eval
๐ Datasets
Successor to Nano-nano v4.
Same architecture family, ~8.5% larger, trained from scratch on 15 carefully weighted datasets.
๐ Quick Facts
| Architecture | LLaMA (decoder-only) |
| Parameters | ~255.7 M |
| Context length | 2 048 tokens |
| Vocabulary | 50,264 tokens |
| Training loss | 5.1763 |
| Eval score | 16.7% |
| Trained on | 0.08 B tokens |
| Hardware | NVIDIA GTX 1080 8 GB (Pascal) |
| Trained | 2026-05-09 22:50 |
๐๏ธ Architecture
Standard LLaMA decoder-only transformer. Scaled ~8.5% wider + 1 extra layer vs v4.
| Hyperparameter | v4 | v4.5 |
|---|---|---|
| Parameters | ~236 M | ~255.7 M |
hidden_size |
896 | 896 |
intermediate_size |
2 688 | 2 912 |
num_hidden_layers |
14 | 15 |
num_attention_heads |
14 | 14 |
num_key_value_heads |
14 | 14 |
head_dim |
64 | 64 |
vocab_size |
50 264 | 50,264 |
max_position_embeddings |
1 024 | 2 048 |
rms_norm_eps |
1e-6 | 1e-6 |
rope_theta |
10 000 | 10 000 |
hidden_act |
SiLU | SiLU |
tie_word_embeddings |
False | False |
attention_bias |
False | False |
mlp_bias |
False | False |
๐ Evaluation
Automatically evaluated after training across 5 capability dimensions.
| Category | Hits | Score |
|---|---|---|
| Knowledge | 0/5 | ๐ด 0% |
| Reasoning | 0/4 | ๐ด 0% |
| Hallucination | 0/4 | ๐ด 0% |
| Instruction | 2/4 | ๐ก 50% |
| Coherence | 1/3 | ๐ด 33% |
| Overall | โ | ๐ด 17% |
Hallucination resistance โ whether the model appropriately declines questions about future events, fictional entities, or impossible premises rather than confabulating.
๐ Category Scores
๐ Hallucination
๐ Training Curves
๐ณ Training
| Setting | Value |
|---|---|
| Hardware | GTX 1080 8 GB ยท Pascal ยท CUDA 6.1 |
| Precision | fp32 weights / fp16 AMP (GradScaler) |
| Optimizer | StovetopCooker (HyperNix, pre-Volta) |
| LR | 0.0001 cosine decay |
| Warmup | 6% of steps |
| Embedding freeze | First 15% of steps |
| Effective batch | 8 ร 2048 = 16,384 tokens/step |
| Steps | 5092 |
| Total tokens | 0.08 B |
| Grad clipping | 1.0 |
| Grad checkpointing | โ |
| Peak VRAM | 5.34 GB |
| HyperNix | โ
freezer ยท StovetopCooker ยท old_fridge ยท new_fridge ยท smoke_alarm ยท pans ยท smoker |
Dataset Mix
| Dataset | Samples | Weight | Category |
|---|---|---|---|
Roman1111111/claude-opus-4.6-10000x |
10 k | 2.5ร | Claude conversations |
WithinUsAI/GPT5.5_thinking_max_distill_god_seed_25K |
25 k | 2.0ร | Reasoning / thinking |
HuggingFaceH4/MATH-500 |
500 | 2.0ร | Competition math |
lighteval/MATH-Hard |
10 k | 2.0ร | Hard math |
garage-bAInd/Open-Platypus |
25 k | 1.8ร | Reasoning instruction |
iamtarun/python_code_instructions_18k_alpaca |
8 k | 1.6ร | Python code |
b-mc2/sql-create-context |
6 k | 1.4ร | SQL code |
nvidia/OpenCodeInstruct |
30 k | 1.5ร | Code instruction |
teknium/OpenHermes-2.5 |
30 k | 1.5ร | General instruction |
Amod/mental_health_counseling_conversations |
5 k | 1.2ร | Chat / counseling |
ray0rf1re/FineWeb-Nano |
50 k | 1.0ร | Web text |
tonytins/chat-dataset |
10 k | 1.0ร | Conversation |
databricks/databricks-dolly-15k |
15 k | 1.0ร | Instruction following |
mlabonne/guanaco-llama2-1k |
1 k | 1.0ร | General QA |
ray0rf1re/hyper-pip |
20 k | 2.0ร | HyperNix pip data |
HuggingFaceH4/ultrachat_200k |
30 k | 1.5ร | Multi-turn chat |
fka/awesome-chatgpt-prompts |
5 k | 0.8ร | Prompt engineering |
๐ Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ray0rf1re/Nano-nano_v4.5",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ray0rf1re/Nano-nano_v4.5")
def generate(prompt: str, max_new_tokens: int = 256) -> str:
text = f"### Instruction:
{prompt}
### Response:
"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens = max_new_tokens,
do_sample = True,
temperature = 0.7,
top_p = 0.9,
repetition_penalty = 1.1,
pad_token_id = tokenizer.eos_token_id,
)
new_ids = out[0][inputs["input_ids"].shape[-1]:]
return tokenizer.decode(new_ids, skip_special_tokens=True).strip()
# Examples
print(generate("Write a Python function to reverse a linked list."))
print(generate("What is the capital of France?"))
print(generate("Explain gradient descent in simple terms."))
โ ๏ธ Limitations
- Context limited to 1 024 tokens โ unsuitable for long documents
- Trained on 0.08 B tokens โ far less than production models
- May hallucinate on obscure or out-of-distribution queries
- Not RLHF/DPO aligned โ outputs may vary in safety and tone
- Pascal GPU constraint (GTX 1080): fp32/fp16 only, no bf16
๐ Citation
@misc{nano-nano-v45,
author = {ray0rf1re},
title = {Nano-nano v4.5: Compact LLaMA-Family Causal LM},
year = {2026},
publisher = {HuggingFace},
howpublished = {https://huggingface.co/ray0rf1re/Nano-nano_v4.5},
}
- Downloads last month
- 1,865
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Datasets used to train ray0rf1re/Nano-nano_v4.5
Collection including ray0rf1re/Nano-nano_v4.5
Evaluation results
- Training Lossself-reported5.176
- Overall Eval Scoreself-reported0.167
- Knowledgeself-reported0.000
- Reasoningself-reported0.000
- Hallucination Resistanceself-reported0.000
- Instruction Followingself-reported0.500
- Coherenceself-reported0.333
