Carnice Qwen3.6 MoE 35B-A3B — Hermes-Focused Agentic Model (GGUF)
QLoRA fine-tune of Qwen3.6-35B-A3B (MoE, 3B active parameters) optimized for agentic workflows and Hermes Agent runtime. Two-stage training adapted from kai-os/Carnice-9b.
This is the successor to Carnice-MoE-35B-A3B (based on Qwen3.5), retrained on the newer Qwen3.6 base which brings improved agentic coding, extended context (262K native, up to 1M with RoPE scaling), and native multimodal support.
Credits
Training methodology adapted from kai-os/Carnice-9b — same two-stage approach and datasets, applied to the larger MoE architecture. Key inspiration: training on actual Hermes Agent execution traces for native agentic behavior.
Available Quantizations
| Quantization | Size | Min VRAM |
|---|---|---|
| F16 | 65 GB | 1x 98GB GPU |
| Q8_0 | 35 GB | 1x 48GB GPU |
| Q6_K | 27 GB | 1x 32GB GPU |
| Q5_K_M | 24 GB | 1x 32GB GPU |
| Q4_K_M | 20 GB | 1x 24GB GPU |
| Q4_K_S | 19 GB | 1x 24GB GPU |
For BF16 safetensors, see samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3.6-35B-A3B |
| Architecture | Mixture of Experts (MoE) |
| Total Parameters | ~35B |
| Active Parameters | ~3B per token |
| Native Context Length | 262,144 tokens |
| Thinking Modes | Thinking / Non-thinking (native Qwen3.6) |
What Makes This Different
Unlike generic reasoning distillation, this model was trained on actual Hermes Agent execution traces — real conversations where an AI agent:
- Executes terminal commands and processes output
- Performs file editing operations
- Chains multi-step tool calls with results feeding back
- Uses browser-assisted workflows
- Makes decisions based on environmental feedback
This teaches the model the exact conversation patterns Hermes expects, rather than just generic reasoning.
Training Details
Two-Stage Approach
Stage A — Reasoning Repair (1 epoch)
- Strengthens base model reasoning before agent-specific training
- Loss: 0.4281
| Dataset | Examples |
|---|---|
| bespokelabs/Bespoke-Stratos-17k | 16,710 |
| AI-MO/NuminaMath-CoT | 17,000 (capped) |
Stage B — Hermes Traces (2 epochs)
- Agent-specific behavioral training on real execution traces
- Loss: 0.3045
| Dataset | Examples |
|---|---|
| kai-os/carnice-glm5-hermes-traces | 1,627 (high quality) |
| open-thoughts/OpenThoughts-Agent-v1-SFT | 15,209 |
Training Configuration
| Parameter | Stage A | Stage B |
|---|---|---|
| LoRA Rank | 64 | 64 |
| LoRA Alpha | 64 | 64 |
| LoRA Targets | q, k, v, o projections | q, k, v, o projections |
| Learning Rate | 2e-5 (linear) | 1e-5 (cosine) |
| Epochs | 1 | 2 |
| Effective Batch | 12 | 12 |
| Context Length | 4096 | 4096 |
| Precision | 4-bit QLoRA + BF16 adapters | Same |
| GPU | RTX PRO 6000 Blackwell (98GB) | Same |
| Total Training Time | ~55 hours (both stages) |
Trainable Parameters
13,762,560 (0.04% of 35.1B total)
Usage with llama.cpp
# Download a quantization (e.g., Q8_0)
huggingface-cli download samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B-GGUF \
Carnice-Qwen3.6-MoE-35B-A3B-Q8_0.gguf --local-dir .
# Run with llama-server
llama-server \
--model Carnice-Qwen3.6-MoE-35B-A3B-Q8_0.gguf \
--n-gpu-layers -1 \
--ctx-size 262144 \
--host 0.0.0.0 --port 8000
Acknowledgements
- kai-os — Carnice training methodology and Hermes traces dataset
- open-thoughts — Agent SFT dataset
- bespokelabs — Bespoke-Stratos reasoning dataset
- Unsloth — QLoRA training framework
- Qwen — Base model
- Downloads last month
- 1,573
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B-GGUF
Base model
Qwen/Qwen3.6-35B-A3B