VOOZH about

URL: https://huggingface.co/empero-ai/Qwable-9B-Claude-Fable-5-GGUF

⇱ empero-ai/Qwable-9B-Claude-Fable-5-GGUF Β· Hugging Face


πŸ‘ Qwable-9B-Claude-Fable-5

Qwable-9B-Claude-Fable-5-GGUF

Developed by Empero

GGUF quantizations of empero-ai/Qwable-9B-Claude-Fable-5 for llama.cpp, Ollama, LM Studio, and other GGUF runtimes. This repo ships a vision projector (mmproj), so the model runs as a full multimodal (image + text) assistant β€” not just text.

Qwable-9B-Claude-Fable-5 is a full-parameter fine-tune of Qwen3.5-9B on agentic coding and reasoning traces distilled from Claude Fable 5 and a GPT-5.5 terminal agent. For full training details and the complete evaluation, see the base model card.

Early release. Strong coding and agentic behavior out of the box; a full benchmark suite is underway and will be published. See Provenance & licensing.

Files

Text weights β€” pick one quant

File Quant Size Notes
Qwable-9B-Claude-Fable-5-Q4_K_M.gguf Q4_K_M 5.3 GB recommended default β€” smallest, runs on ~6–8 GB VRAM
Qwable-9B-Claude-Fable-5-Q5_K_M.gguf Q5_K_M 6.1 GB balanced quality / size
Qwable-9B-Claude-Fable-5-Q6_K.gguf Q6_K 6.9 GB high quality
Qwable-9B-Claude-Fable-5-Q8_0.gguf Q8_0 8.9 GB near-lossless
Qwable-9B-Claude-Fable-5-bf16.gguf BF16 17 GB full precision (conversion base)

Vision projector β€” for image input

File Size Notes
mmproj-Qwable-9B-Claude-Fable-5-f16.gguf 876 MB CLIP vision encoder; required for images, pairs with any quant above

Text-only use needs just a quant. For image understanding, download both a text quant and the mmproj.

Usage

llama.cpp β€” text

llama-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf --jinja \
 -p "Write a Python function that merges two sorted lists." \
 --temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.05 -n 2048

llama.cpp β€” multimodal (image + text)

llama-mtmd-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf \
 --mmproj mmproj-Qwable-9B-Claude-Fable-5-f16.gguf \
 --image photo.jpg -p "Describe this image." \
 --temp 0.6 --top-p 0.95 --top-k 20 -n 512

Ollama

ollama run hf.co/empero-ai/Qwable-9B-Claude-Fable-5-GGUF:Q4_K_M

Or via a Modelfile (pulls in the vision projector for image support):

FROM ./Qwable-9B-Claude-Fable-5-Q4_K_M.gguf
FROM ./mmproj-Qwable-9B-Claude-Fable-5-f16.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05

Sampling & output format

  • Sampling (Qwen3.5 recommended): general tasks temp 1.0, precise coding temp 0.6; top_p 0.95, top_k 20, min_p 0. Use repeat_penalty 1.05 (a small bump from Qwen's default 1.0) to avoid rare non-terminating reasoning loops, and allow generous -n / max_new_tokens.
  • Reasoning model: every response opens with a <think>...</think> block before the final answer β€” parse and strip that span for end users.

Model details

  • Developed by: Empero
  • Base model: Qwen3.5-9B β€” a dense, natively multimodal model with a hybrid attention stack (3:1 Gated DeltaNet linear-attention to Gated full-attention), ~152k vocabulary, long native context.
  • Fine-tune type: full parameter (all text-backbone weights trained), assistant-only loss. The vision tower was left unchanged from the base β€” so vision works (via the included mmproj) but was inherited, not specifically tuned.
  • Format: GGUF (text quants + CLIP mmproj), converted and quantized with llama.cpp.
  • Languages: primarily English.

Evaluation

The evaluation below was measured on the unquantized fine-tune. Quantized variants are very close at Q8_0/Q6_K and degrade gradually toward Q4_K_M β€” expect a small quality drop at the lower quants.

Training quality was tracked via held-out validation loss / token-accuracy on a 100-example split (80% Fable / 20% terminal), plus a qualitative generation review:

Step eval loss eval token-acc
100 0.743 0.784
300 (β‰ˆ epoch 1) 0.714 0.791
500 0.713 0.791

No overfitting: held-out loss decreased then plateaued (~0.71) through epoch 2 β€” it never rose even as train loss fell to ~0.64. In a 34-prompt qualitative review, roughly 27/34 responses were clean and correct, strongest on coding and terminal/agentic tasks β€” current tooling (ss over netstat, git-filter-repo, Argon2id) with security-aware judgment (rotating a leaked key first, constant-time comparison). Full transcripts: sample_generations.md.

Limitations

  • Reasoning model. Each response opens with a <think> block; strip it for end users and allow generous output length. Use repeat_penaltyβ‰ˆ1.05 for consistently crisp completions.
  • Strongest within its domain (coding / agentic / reasoning). For general-knowledge or long-form factual questions, verify specifics as with any 9B model.
  • Reflects its base and teachers. A distillation fine-tune of Qwen3.5-9B on Claude Fable 5 and GPT-5.5 traces; it carries their style and limits and received no extra safety tuning. Add your own review/safety layer for production.
  • Quantization. Lower quants (esp. Q4_K_M) trade a little accuracy for size; use Q6_K/Q8_0 when quality matters most.

Quantization

Converted from the fine-tuned weights with llama.cpp convert_hf_to_gguf.py, then quantized with llama-quantize. The BF16 GGUF is the conversion base; the K-quants are derived from it. The mmproj is the base Qwen3.5-VL vision encoder (unchanged by fine-tuning). All files were verified to load and generate in llama.cpp β€” text (code, reasoning) and image understanding both confirmed.

Provenance & licensing

Weights are released under Apache-2.0, inherited from the Qwen3.5-9B base. The fine-tuning data comes from generated traces of Claude Fable 5 and GPT-5.5 (via the linked public datasets). Because those traces originate from third-party assistants, the providers' terms may apply to downstream training and distillation β€” if you plan to build on this model commercially, confirm your use aligns with those terms. Shared with the community for research and experimentation, as-is.

Acknowledgements

Downloads last month
1,254
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for empero-ai/Qwable-9B-Claude-Fable-5-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(10)
this model

Datasets used to train empero-ai/Qwable-9B-Claude-Fable-5-GGUF

Collection including empero-ai/Qwable-9B-Claude-Fable-5-GGUF