Qwable-9B-Claude-Fable-5-GGUF
Developed by Empero
GGUF quantizations of empero-ai/Qwable-9B-Claude-Fable-5
for llama.cpp, Ollama, LM Studio, and other GGUF runtimes. This repo
ships a vision projector (mmproj), so the model runs as a full multimodal (image + text) assistant β
not just text.
Qwable-9B-Claude-Fable-5 is a full-parameter fine-tune of Qwen3.5-9B on agentic coding and reasoning traces distilled from Claude Fable 5 and a GPT-5.5 terminal agent. For full training details and the complete evaluation, see the base model card.
Early release. Strong coding and agentic behavior out of the box; a full benchmark suite is underway and will be published. See Provenance & licensing.
Files
Text weights β pick one quant
| File | Quant | Size | Notes |
|---|---|---|---|
Qwable-9B-Claude-Fable-5-Q4_K_M.gguf |
Q4_K_M | 5.3 GB | recommended default β smallest, runs on ~6β8 GB VRAM |
Qwable-9B-Claude-Fable-5-Q5_K_M.gguf |
Q5_K_M | 6.1 GB | balanced quality / size |
Qwable-9B-Claude-Fable-5-Q6_K.gguf |
Q6_K | 6.9 GB | high quality |
Qwable-9B-Claude-Fable-5-Q8_0.gguf |
Q8_0 | 8.9 GB | near-lossless |
Qwable-9B-Claude-Fable-5-bf16.gguf |
BF16 | 17 GB | full precision (conversion base) |
Vision projector β for image input
| File | Size | Notes |
|---|---|---|
mmproj-Qwable-9B-Claude-Fable-5-f16.gguf |
876 MB | CLIP vision encoder; required for images, pairs with any quant above |
Text-only use needs just a quant. For image understanding, download both a text quant and the mmproj.
Usage
llama.cpp β text
llama-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf --jinja \
-p "Write a Python function that merges two sorted lists." \
--temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.05 -n 2048
llama.cpp β multimodal (image + text)
llama-mtmd-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf \
--mmproj mmproj-Qwable-9B-Claude-Fable-5-f16.gguf \
--image photo.jpg -p "Describe this image." \
--temp 0.6 --top-p 0.95 --top-k 20 -n 512
Ollama
ollama run hf.co/empero-ai/Qwable-9B-Claude-Fable-5-GGUF:Q4_K_M
Or via a Modelfile (pulls in the vision projector for image support):
FROM ./Qwable-9B-Claude-Fable-5-Q4_K_M.gguf
FROM ./mmproj-Qwable-9B-Claude-Fable-5-f16.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
Sampling & output format
- Sampling (Qwen3.5 recommended): general tasks
temp 1.0, precise codingtemp 0.6;top_p 0.95, top_k 20, min_p 0. Userepeat_penalty 1.05(a small bump from Qwen's default 1.0) to avoid rare non-terminating reasoning loops, and allow generous-n/max_new_tokens. - Reasoning model: every response opens with a
<think>...</think>block before the final answer β parse and strip that span for end users.
Model details
- Developed by: Empero
- Base model: Qwen3.5-9B β a dense, natively multimodal model with a hybrid attention stack (3:1 Gated DeltaNet linear-attention to Gated full-attention), ~152k vocabulary, long native context.
- Fine-tune type: full parameter (all text-backbone weights trained), assistant-only loss. The vision
tower was left unchanged from the base β so vision works (via the included
mmproj) but was inherited, not specifically tuned. - Format: GGUF (text quants + CLIP
mmproj), converted and quantized with llama.cpp. - Languages: primarily English.
Evaluation
The evaluation below was measured on the unquantized fine-tune. Quantized variants are very close at Q8_0/Q6_K and degrade gradually toward Q4_K_M β expect a small quality drop at the lower quants.
Training quality was tracked via held-out validation loss / token-accuracy on a 100-example split (80% Fable / 20% terminal), plus a qualitative generation review:
| Step | eval loss | eval token-acc |
|---|---|---|
| 100 | 0.743 | 0.784 |
| 300 (β epoch 1) | 0.714 | 0.791 |
| 500 | 0.713 | 0.791 |
No overfitting: held-out loss decreased then plateaued (~0.71) through epoch 2 β it never rose even as
train loss fell to ~0.64. In a 34-prompt qualitative review, roughly 27/34 responses were clean and
correct, strongest on coding and terminal/agentic tasks β current tooling (ss over netstat,
git-filter-repo, Argon2id) with security-aware judgment (rotating a leaked key first, constant-time
comparison). Full transcripts: sample_generations.md.
Limitations
- Reasoning model. Each response opens with a
<think>block; strip it for end users and allow generous output length. Userepeat_penaltyβ1.05for consistently crisp completions. - Strongest within its domain (coding / agentic / reasoning). For general-knowledge or long-form factual questions, verify specifics as with any 9B model.
- Reflects its base and teachers. A distillation fine-tune of Qwen3.5-9B on Claude Fable 5 and GPT-5.5 traces; it carries their style and limits and received no extra safety tuning. Add your own review/safety layer for production.
- Quantization. Lower quants (esp. Q4_K_M) trade a little accuracy for size; use Q6_K/Q8_0 when quality matters most.
Quantization
Converted from the fine-tuned weights with llama.cpp convert_hf_to_gguf.py, then quantized with
llama-quantize. The BF16 GGUF is the conversion base; the K-quants are derived from it. The mmproj is the
base Qwen3.5-VL vision encoder (unchanged by fine-tuning). All files were verified to load and generate
in llama.cpp β text (code, reasoning) and image understanding both confirmed.
Provenance & licensing
Weights are released under Apache-2.0, inherited from the Qwen3.5-9B base. The fine-tuning data comes from generated traces of Claude Fable 5 and GPT-5.5 (via the linked public datasets). Because those traces originate from third-party assistants, the providers' terms may apply to downstream training and distillation β if you plan to build on this model commercially, confirm your use aligns with those terms. Shared with the community for research and experimentation, as-is.
Acknowledgements
- Developed and released by Empero
- Base model: Qwen3.5-9B (Alibaba Qwen team)
- Datasets:
Glint-Research/Fable-5-traces,Roman1111111/gpt5.5-terminal - Tooling: llama.cpp, TRL, Transformers
- Downloads last month
- 1,254
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for empero-ai/Qwable-9B-Claude-Fable-5-GGUF
Base model
Qwen/Qwen3.5-9B-Base