Qwen3-0.6B-ToolCalling-Claude-4.6-Opus-Distilled-v1
This model is a function-calling and reasoning-oriented fine-tune of Qwen/Qwen3-0.6B, trained with a 2-stage recipe:
- SFT on mixed tool-calling + reasoning data
- DPO to improve call-vs-no-call decisions
Model Summary
- Base model:
Qwen/Qwen3-0.6B - Method: LoRA SFT + LoRA DPO
- Primary goal: improve tool invocation decisions and argument formatting while keeping concise reasoning behavior
- Final artifacts in this repo:
lora_adapter/*(adapter weights)merged_safetensors/*(merged full model)merged_gguf/model-f16.gguf
Training Data
Training used the following public datasets:
Salesforce/xlam-function-calling-60knvidia/When2CallRoman1111111/claude-opus-4.6-10000xCrownelius/Opus-4.6-Reasoning-3300x
Local merged split sizes:
- SFT train: 91,048
- SFT val: 1,858
- DPO train: 8,550
- DPO val: 450
Training Procedure
Stage 1: SFT
- LoRA:
r=64,alpha=128,dropout=0.05,target_modules=all-linear - Sequence length:
16384 - Epochs:
3 - Per-device batch size:
2 - Gradient accumulation:
16 - LR:
2e-4(cosine, warmup ratio0.05) - bf16 + gradient checkpointing
- Best SFT eval loss:
0.126719(checkpoint-8500)
Stage 2: DPO
- LoRA:
r=32,alpha=64,dropout=0.05,target_modules=all-linear beta=0.1,loss_type=sigmoidmax_length=1024,max_prompt_length=512- Per-device batch size:
1 - Gradient accumulation:
32 - LR:
5e-7(cosine, warmup ratio0.03) - bf16 + gradient checkpointing
precompute_ref_log_probs=true,precompute_ref_batch_size=1
Runtime Environment
- GPU:
NVIDIA GeForce RTX 5090(31.37 GiB) - CUDA runtime (PyTorch):
12.8 - PyTorch:
2.11.0+cu128 - TRL:
1.0.0 - PEFT:
0.18.1 - Transformers:
5.5.1
Evaluation
Evaluation run date: 2026-04-12 (UTC)
Command:
python eval/eval_fc.py \
--model_path ./outputs/final_weights/merged_safetensors \
--eval_type all \
--device cuda
Results:
- Built-in function-calling checks (4 cases):
- Format correctness:
3/4(75%) - Tool selection:
3/4(75%)
- Format correctness:
- When2Call MCQ (sample=100 from test/mcq):
- Accuracy:
67/100(67%)
- Accuracy:
Usage
Transformers (merged model)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "diverWayne/Qwen3-0.6B-ToolCalling-Claude-4.6-Opus-Distilled-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="merged_safetensors", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, subfolder="merged_safetensors", trust_remote_code=True)
LoRA adapter
from transformers import AutoModelForCausalLM
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "diverWayne/Qwen3-0.6B-ToolCalling-Claude-4.6-Opus-Distilled-v1", subfolder="lora_adapter")
Limitations
- This model can still over-call tools when user input lacks required slots.
- Built-in check includes only 4 handcrafted cases and should not be treated as a benchmark.
- When2Call score is from a 100-sample quick evaluation, not full test set scoring.
- Outputs may inherit bias and errors from source datasets and synthetic data.
License
This repository is released under Apache-2.0.
Please separately verify the licenses/terms of each training dataset and upstream base model before commercial use.
- Downloads last month
- 68
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware
6-bit
16-bit
Model tree for diverWayne/Qwen3-0.6B-ToolCalling-Claude-4.6-Opus-Distilled-v1
Datasets used to train diverWayne/Qwen3-0.6B-ToolCalling-Claude-4.6-Opus-Distilled-v1
Evaluation results
- Accuracy on When2Call MCQ (sample=100)self-reported0.670
- Format correctness on Built-in Function-Calling Cases (n=4)self-reported0.750
- Tool selection on Built-in Function-Calling Cases (n=4)self-reported0.750
