Qwable-9B-Claude-Fable-5-GGUF

Developed by Empero

GGUF quantizations of empero-ai/Qwable-9B-Claude-Fable-5 for llama.cpp, Ollama, LM Studio, and other GGUF runtimes. This repo ships a vision projector (mmproj), so the model runs as a full multimodal (image + text) assistant — not just text.

Qwable-9B-Claude-Fable-5 is a full-parameter fine-tune of Qwen3.5-9B on agentic coding and reasoning traces distilled from Claude Fable 5 and a GPT-5.5 terminal agent. For full training details and the complete evaluation, see the base model card.

Early release. Strong coding and agentic behavior out of the box; a full benchmark suite is underway and will be published. See Provenance & licensing.

Files

Text weights — pick one quant

File	Quant	Size	Notes
`Qwable-9B-Claude-Fable-5-Q4_K_M.gguf`	Q4_K_M	5.3 GB	recommended default — smallest, runs on ~6–8 GB VRAM
`Qwable-9B-Claude-Fable-5-Q5_K_M.gguf`	Q5_K_M	6.1 GB	balanced quality / size
`Qwable-9B-Claude-Fable-5-Q6_K.gguf`	Q6_K	6.9 GB	high quality
`Qwable-9B-Claude-Fable-5-Q8_0.gguf`	Q8_0	8.9 GB	near-lossless
`Qwable-9B-Claude-Fable-5-bf16.gguf`	BF16	17 GB	full precision (conversion base)

Vision projector — for image input

File	Size	Notes
`mmproj-Qwable-9B-Claude-Fable-5-f16.gguf`	876 MB	CLIP vision encoder; required for images, pairs with any quant above

Text-only use needs just a quant. For image understanding, download both a text quant and the mmproj.

Usage

llama.cpp — text

llama-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf --jinja \
 -p "Write a Python function that merges two sorted lists." \
 --temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.05 -n 2048

llama.cpp — multimodal (image + text)

llama-mtmd-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf \
 --mmproj mmproj-Qwable-9B-Claude-Fable-5-f16.gguf \
 --image photo.jpg -p "Describe this image." \
 --temp 0.6 --top-p 0.95 --top-k 20 -n 512

Ollama

ollama run hf.co/empero-ai/Qwable-9B-Claude-Fable-5-GGUF:Q4_K_M

Or via a Modelfile (pulls in the vision projector for image support):

FROM ./Qwable-9B-Claude-Fable-5-Q4_K_M.gguf
FROM ./mmproj-Qwable-9B-Claude-Fable-5-f16.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05

Sampling & output format

Sampling (Qwen3.5 recommended): general tasks temp 1.0, precise coding temp 0.6; top_p 0.95, top_k 20, min_p 0. Use repeat_penalty 1.05 (a small bump from Qwen's default 1.0) to avoid rare non-terminating reasoning loops, and allow generous -n / max_new_tokens.
Reasoning model: every response opens with a <think>...</think> block before the final answer — parse and strip that span for end users.

Model details

Developed by: Empero
Base model: Qwen3.5-9B — a dense, natively multimodal model with a hybrid attention stack (3:1 Gated DeltaNet linear-attention to Gated full-attention), ~152k vocabulary, long native context.
Fine-tune type: full parameter (all text-backbone weights trained), assistant-only loss. The vision tower was left unchanged from the base — so vision works (via the included mmproj) but was inherited, not specifically tuned.
Format: GGUF (text quants + CLIP mmproj), converted and quantized with llama.cpp.
Languages: primarily English.

Evaluation

The evaluation below was measured on the unquantized fine-tune. Quantized variants are very close at Q8_0/Q6_K and degrade gradually toward Q4_K_M — expect a small quality drop at the lower quants.

Training quality was tracked via held-out validation loss / token-accuracy on a 100-example split (80% Fable / 20% terminal), plus a qualitative generation review:

Step	eval loss	eval token-acc
100	0.743	0.784
300 (≈ epoch 1)	0.714	0.791
500	0.713	0.791

No overfitting: held-out loss decreased then plateaued (~0.71) through epoch 2 — it never rose even as train loss fell to ~0.64. In a 34-prompt qualitative review, roughly 27/34 responses were clean and correct, strongest on coding and terminal/agentic tasks — current tooling (ss over netstat, git-filter-repo, Argon2id) with security-aware judgment (rotating a leaked key first, constant-time comparison). Full transcripts: sample_generations.md.

Limitations

Reasoning model. Each response opens with a <think> block; strip it for end users and allow generous output length. Use repeat_penalty≈1.05 for consistently crisp completions.
Strongest within its domain (coding / agentic / reasoning). For general-knowledge or long-form factual questions, verify specifics as with any 9B model.
Reflects its base and teachers. A distillation fine-tune of Qwen3.5-9B on Claude Fable 5 and GPT-5.5 traces; it carries their style and limits and received no extra safety tuning. Add your own review/safety layer for production.
Quantization. Lower quants (esp. Q4_K_M) trade a little accuracy for size; use Q6_K/Q8_0 when quality matters most.

Quantization

Converted from the fine-tuned weights with llama.cpp convert_hf_to_gguf.py, then quantized with llama-quantize. The BF16 GGUF is the conversion base; the K-quants are derived from it. The mmproj is the base Qwen3.5-VL vision encoder (unchanged by fine-tuning). All files were verified to load and generate in llama.cpp — text (code, reasoning) and image understanding both confirmed.

Provenance & licensing

Weights are released under Apache-2.0, inherited from the Qwen3.5-9B base. The fine-tuning data comes from generated traces of Claude Fable 5 and GPT-5.5 (via the linked public datasets). Because those traces originate from third-party assistants, the providers' terms may apply to downstream training and distillation — if you plan to build on this model commercially, confirm your use aligns with those terms. Shared with the community for research and experimentation, as-is.

Acknowledgements

Developed and released by Empero
Base model: Qwen3.5-9B (Alibaba Qwen team)
Datasets: Glint-Research/Fable-5-traces, Roman1111111/gpt5.5-terminal
Tooling: llama.cpp, TRL, Transformers

Downloads last month: 1,254

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for empero-ai/Qwable-9B-Claude-Fable-5-GGUF

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

empero-ai/Qwable-9B-Claude-Fable-5

Quantized

(10)

this model

Datasets used to train empero-ai/Qwable-9B-Claude-Fable-5-GGUF

Collection including empero-ai/Qwable-9B-Claude-Fable-5-GGUF

Our series of Qwen3.5 finetunes on Claude-Fable-5 outputs • 2 items • Updated 3 days ago

URL: https://huggingface.co/empero-ai/Qwable-9B-Claude-Fable-5-GGUF

⇱ empero-ai/Qwable-9B-Claude-Fable-5-GGUF · Hugging Face