Qwen3.5-9B-MTP-SWE-Agent-GGUF
π Hugging Face
π GGUF
π Benchmark
A 9B Qwen3.5 merge tuned for SWE-agent style workflows: multi-turn tool use, debugging, structured code generation, and reasoning-heavy instruction following. The model keeps MTP draft layers for speculative decoding in llama.cpp.
Overview
| Item | Value |
|---|---|
| Model family | Qwen3.5-9B MTP merge |
| Training focus | SWE workflows, tool calling, concise instruction following, reasoning traces |
| Primary runtime | llama.cpp OpenAI-compatible API |
| Recommended quant | Qwen3.5-9B-MTP-SWE-Agent-GGUF-Q4_K_M.gguf |
| Size target | 9B class |
| Typical use | Agentic coding, debugging, tool planning, structured outputs |
Benchmark Snapshot
Measured locally against a live llama.cpp server with temperature 0.
| Metric | Result |
|---|---|
| Tests | 93 / 93 passed |
| Pass rate | 100.0% |
| Weighted score | 100.0% |
| Avg latency | 1.41 s |
| Median latency | 0.95 s |
| Avg generation speed | 95.2 tok/s |
Category Breakdown
| Category | Tests | Passed | Pass % | Score % | Avg latency | Avg gen tok/s |
|---|---|---|---|---|---|---|
| Debug | 15 | 15 | 100.0 | 100.0 | 2.54 s | 94.1 |
| Tool plan | 12 | 12 | 100.0 | 100.0 | 0.60 s | 96.3 |
| Tool call | 15 | 15 | 100.0 | 100.0 | 0.39 s | 96.2 |
| Code fix | 15 | 15 | 100.0 | 100.0 | 2.14 s | 94.5 |
| Workflow | 9 | 9 | 100.0 | 100.0 | 1.49 s | 94.8 |
| Discipline | 12 | 12 | 100.0 | 100.0 | 0.58 s | 96.6 |
| Patch | 6 | 6 | 100.0 | 100.0 | 3.00 s | 94.3 |
| Reasoning | 9 | 9 | 100.0 | 100.0 | 1.00 s | 94.6 |
Capability Matrix
| Capability | Score |
|---|---|
| Algorithm implementation | 100.0% |
| Complexity analysis | 100.0% |
| Concurrency debugging | 100.0% |
| Config inspection | 100.0% |
| Defensive None-guard | 100.0% |
| Dependency debugging | 100.0% |
| Exception handling | 100.0% |
| Format compliance (no markdown) | 100.0% |
| Git knowledge | 100.0% |
| Incident analysis | 100.0% |
| Incident response | 100.0% |
| Instruction following (short reply) | 100.0% |
| Memory profiling knowledge | 100.0% |
| No thinking tag leak | 100.0% |
| Off-by-one fix | 100.0% |
| PR workflow knowledge | 100.0% |
| Patch generation | 100.0% |
| Patch generation (docstring) | 100.0% |
| Python knowledge | 100.0% |
| Refactor planning | 100.0% |
| Root-cause analysis | 100.0% |
| Security fix (SQL injection) | 100.0% |
| Security review knowledge | 100.0% |
| Test execution planning | 100.0% |
| Token limit following | 100.0% |
| Tool call β exec_shell_command | 100.0% |
| Tool call β grep_search | 100.0% |
| Tool call β list_directory | 100.0% |
| Tool call β read_file | 100.0% |
| Tool call β write_file | 100.0% |
| Tool-use planning | 100.0% |
What the Benchmark Covers
| Area | Examples |
|---|---|
| Debugging | NoneType errors, connection pools, missing dependencies, race conditions, memory leaks |
| Tool planning | grep_search, read_file, write_file, exec_shell_command, list_directory |
| Tool calls | Structured OpenAI-style function calls with argument validation |
| Code repair | Python bug fixes, guards, binary search, SQL injection mitigation, exception wrapping |
| Workflow | PR checklists, incident response, code review checklists |
| Discipline | Exact replies, no fake turns, no markdown, token-limit compliance |
| Patch literacy | Unified diff generation and docstring edits |
| Reasoning | Complexity analysis and conflict resolution |
Representative SWE / Agentic Cases
| ID | What it validates |
|---|---|
swe_debug_plan |
Numbered debug plan for a NoneType.get error on auth.py:42 |
swe_pool_exhausted |
Root cause and remediation for connection pool exhaustion |
swe_missing_module |
Fix workflow for ModuleNotFoundError: requests |
agent_tool_plan |
Ordered multi-tool plan using repo search and file reads |
tool_read |
Correct read_file tool call |
tool_grep |
Correct grep_search tool call |
tool_pytest |
Correct exec_shell_command tool call |
GGUF Files
| Quantization | File |
|---|---|
| Q4_K_M | Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q4_K_M.gguf |
| Q8_0 | Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q8_0.gguf |
| BF16 | Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16.gguf |
| mmproj | Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16-mmproj.gguf |
Training Mix
| Dataset | Weight | Purpose |
|---|---|---|
| nebius/SWE-agent-trajectories | 35% | Real SWE agent traces |
| vsamuel/verbosity-control-training | 22% | Conciseness control |
| teknium/OpenHermes-2.5 | 20% | Instruction quality |
| Jackrong/Claude-opus-4.7-TraceInversion-5000x | 8% | Reasoning traces |
| Jackrong/Claude-opus-4.6-TraceInversion-9000x | 7% | Reasoning traces |
Notes
| Topic | Note |
|---|---|
| Benchmark style | Local API runs with fixed prompts and deterministic decoding |
| Output handling | Some backends split content and reasoning_content; clients should merge carefully if needed |
| Safety | Generated code should be reviewed before execution |
| SWE-bench | This page describes the projectβs local benchmark suite, not the SWE-bench Verified leaderboard |
Links
| Resource | Link |
|---|---|
| Model repository | https://huggingface.co/raicoon2k/Qwen3.5-9B-MTP-SWE-Agent-GGUF |
| llama.cpp | https://github.com/ggerganov/llama.cpp |
| SWE-bench | https://www.swebench.com/ |
- Downloads last month
- 1,415
GGUF
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for raicoon2k/Qwen3.5-9B-MTP-SWE-Agent-GGUF
Base model
Qwen/Qwen3.5-9B-Base Finetuned
trohrbaugh/Qwen3.5-9B-heretic-v2 Quantized
Crownelius/Crow-9B-HERETIC-4.6