Gemma4 26B MoE โ Hermes Tool-Use Reasoning LoRA ๐ ๏ธ
LoRA adapter fine-tuned from google/gemma-4-26B-A4B-it on Hermes Reasoning Tool Use dataset โ 10K subset of tool-calling conversations with reasoning traces, trained by UKA (Hermes Agent) ๐ค
๐ Summary
| Detail | Value |
|---|---|
| Base Model | google/gemma-4-26B-A4B-it (26B MoE, 128 experts) |
| Dataset | interstellarninja/hermes_reasoning_tool_use (10K subset of 51K) |
| Method | Custom NF4 per-expert quantization + LoRA |
| Pipeline | AndriejusNak/gemma4-26b-moe-finetune |
| GPU | NVIDIA RTX 5090 32GB (Vast.ai Cloud) |
| Training Time | 346 minutes (~5h 46m) |
| Best Loss | 0.5443 |
| NaN Explosions | 0 |
๐ฅ๏ธ Hardware
| Component | Specification |
|---|---|
| GPU | NVIDIA GeForce RTX 5090 32GB GDDR7 |
| CPU | Intel Core i7-14700K (28 cores) |
| RAM | 94 GB DDR5 |
| Disk | 200 GB NVMe SSD |
| Cloud | Vast.ai |
| PyTorch | 2.12.0.dev (nightly, cu128) |
๐ง Training Configuration
# v6_26b_pipeline.py
MODEL_NAME = "google/gemma-4-26B-A4B-it"
MAX_SEQ_LENGTH = 1536 # Longer for tool definitions + conversations
LORA_R = 32
LORA_ALPHA = 32
INCLUDE_MLP_LORA = True
SFT_EPOCHS = 2
SFT_BATCH_SIZE = 2 # Reduced for seq=1536
SFT_GRAD_ACCUM = 8 # Effective batch = 16
SFT_LR = 2e-5
SFT_FILES = ["data/hermes_tool_10k.jsonl"]
LoRA Details
- Rank (r): 32, Alpha: 32
- Target modules:
q_proj,k_proj,v_proj,o_proj+gate_proj,up_proj,down_proj - Trainable params: 59,275,776 / 3,027,224,428 (1.96%)
Loss Progression
Step 50: Loss 3.0767 (epoch 1)
Step 100: Loss 1.0241
Step 150: Loss 0.7901
...
Step 600: Loss 0.5698
โ Epoch 1 avg: 0.8616
Step 750: Loss 0.5277 (epoch 2)
Step 900: Loss 0.5500
Step 1050: Loss 0.5407
Step 1200: Loss 0.5126
โ Epoch 2 avg: 0.5443 ๐ฏ Best!
๐ Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-26B-A4B-it",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(model, "hotdogs/gemma4-26b-hermes-tool-reasoning-lora")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")
messages = [
{"role": "system", "content": "You are a function calling AI. Use tools when needed."},
{"role": "user", "content": "Search for the latest papers on MoE models."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
๐ฆ Files
adapter_model.safetensors โ LoRA weights (227 MB)
adapter_config.json โ r=32, alpha=32
tokenizer.json โ Gemma 4 tokenizer (31 MB)
v6_26b_pipeline.py โ Training script
๐ Credits
- Base Model: Google Gemma 4 26B
- Dataset: interstellarninja/hermes_reasoning_tool_use
- Pipeline: AndriejusNak/gemma4-26b-moe-finetune
- Trainer: UKA (Hermes Agent)
- Downloads last month
- 85
GGUF
Model size
37.2M params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
