πͺ Qwopus3.6-27B-v2
SFT ReleaseReasoning-Enhanced Dense Language Model Fine-Tuned on Qwen3.6-27B
π‘ What is Qwopus3.6-27B-v2?
πͺ Qwopus3.6-27B-v2 is a reasoning-enhanced dense language model built on top of Qwen3.6-27B. By leveraging a multi-stage curriculum learning pipeline and augmented with Trace Inversion datasets (claude-opus-4.6/4.7-traceInversion), it reverse-engineers the compressed "Reasoning Bubbles" of commercial LLMs into structured, step-by-step synthetic reasoning traces, successfully eliminating logical shortcuts and knowledge fractures.
<think> tags.
π‘ 1. Base Model, Training Library & Cooperation
Qwen3.6-27B is a state-of-the-art dense large language model developed by Alibaba Cloud. Boasting 27 billion parameters, this base model natively supports long-context modeling and is engineered for agentic workflows, complex logical reasoning, and multimodal capabilities.
| Attribute | Specifications & Details |
|---|---|
| π§ Architecture | Dense Transformer / 27 Billion Parameters |
| π’ Developer | Alibaba Cloud (DAMO Academy) |
| π Context Window | Native Support Up to 32K / 128K context length |
| π― Focus Domains | Agentic Coding, Deep Logic Reasoning, Multimodal Tasks (Vision & Tool-use) |
| 𧬠Distillation Strategy | Cross-source SFT alignment and multi-teacher distillation to close the capability gap. |
| β‘ RL Scalability | Optimized for downstream Reinforcement Learning alignment and self-critical learning loops. |
Vision & Tool Calling Support: Qwopus3.6-27B-v2 natively supports vision and tool-use capabilities. To enable vision functionality, download
mmproj.gguffrom the GGUF Repository and place it in the same directory as the main.gguffile.
Community Release Notice: Qwopus3.6-27B-v2 is an experimental community release and has not undergone complete safety evaluations or standard benchmarking. It is intended solely for research and exploration.
π 2. Background & Motivation
This model integrates:
- claude-opus-4.6-traceInversion-9000x: 9,000 high-value, fully reconstructed step-by-step reasoning trajectories.
- claude-opus-4.7-traceInversion-5000x: 5,000 complex multi-turn logic and mathematics samples optimized for negative entropy reconstruction.
β‘ 3. Reasoning Efficiency & MTP Speedup
β‘ Reasoning Efficiency & MTP Acceleration
A compact view of how many output tokens are needed to produce correct answers, and how the MTP variant improves inference throughput.
| Metric Definition | Qwen3.6-27B | Qwopus3.6-27B-v2 | Efficiency Gain |
|---|---|---|---|
| Definition A: average output tokens on correctly answered questions only. | 1,433.3 tokens | 918.7 tokens | 35.9% fewer tokens |
| Definition B: total output tokens divided by the number of correct answers, including token cost from wrong answers. | 2,511.0 tokens | 2,155.8 tokens | 14.2% less systemic overhead |
| Metric | Qwen3.6-27B | Qwopus3.6-27B-v2 | Delta |
|---|---|---|---|
| Correct answers per 10,000 output tokens | 3.98 | 4.64 | +16.6% |
| Total output token cost | 738,238 tokens | 627,325 tokens | 15.0% fewer tokens |
| CoT Extraction Mode | Qwen3.6-27B | Qwopus3.6-27B-v2 | Reduction |
|---|---|---|---|
Normal thinking endings: text before the closing </think> tag only. |
1,680.3 tokens 5,169.4 chars |
798.5 tokens 2,370.0 chars |
52.5% shorter |
π 4. Evaluation & Benchmarks
π Evaluation & Performance Metrics
Detailed benchmark results on MMLU-Pro, SWE-bench, frontend page layout generation, creative coding, and agentic reasoning.
| Model | Correct / Total | Accuracy |
|---|---|---|
| Qwopus3.6-27B-v2 | 306 / 350 | 87.43% |
| Qwen3.6-27B | 297 / 350 | 84.86% |
| Category | Qwen3.6-27B | Qwopus3.6-27B-v2 | Delta |
|---|---|---|---|
| Biology | 96% | 96% | 0 pp |
| Business | 88% | 94% | +6 pp |
| Computer Science | 82% | 84% | +2 pp |
| Mathematics | 90% | 88% | -2 pp |
| Physics | 76% | 86% | +10 pp |
| Chemistry | 74% | 80% | +6 pp |
| Health | 88% | 84% | -4 pp |
Summary: On the selected 350-question MMLU-Pro evaluation set, Qwopus3.6-27B-v2 achieved 87.43% accuracy, outperforming Qwen3.6-27B at 84.86%. Qwopus3.6-27B-v2 is stronger in Business, Computer Science, Physics, and Chemistry, while Qwen3.6-27B remains ahead in Mathematics and Health.
| Model / Configuration | Sampling | Resolved | Empty Patches | Resolve % |
|---|---|---|---|---|
| Qwopus 3.6 27B v2 (dense) | temp 1.0, step 275, single-slot | 152 / 202 | 1 | 75.25% |
Execution Details: 19h 29m wall-clock on a single RTX 5090 using a 160K fp16 context window. Every instance successfully exited with Submitted status. 0 step-limit hits and 0 context-overflow failures occurred. Median trajectory length was 67 / 275 steps.
<think> block, whereas a higher temperature enables the model to utilize the full breadth of reasoning paths established during training.
| Metric | Qwopus 3.6 35B-A3B (MoE, Q5) | Qwopus 3.6 27B V2 (Dense, Q5) |
|---|---|---|
| Average Throughput | 161.9 tok/s | 43.9 tok/s |
| Throughput Range | 154.4 / 164.8 tok/s | 43.1 / 44.6 tok/s |
| VRAM Usage | ~25 GB (65K q8 context) | ~31 GB (160K fp16 context) |
| Completion Tokens (Suite) | 106,688 tokens | 119,036 tokens |
| Total Runtime (Suite) | 11.1 min | 45.3 min |
Architecture Trade-off: The MoE wins on raw throughput by ~3.7x due to its A3B routing pattern. However, the Dense 27B model offsets this with superior per-token reasoning depth. We recommend the Dense 27B model for complex agentic workflows, long-context reasoning, and code execution, and the MoE model for fast, high-throughput generations. Tight throughput variance (Β±0.75 tok/s) indicates the dense model is fully memory-bandwidth-bound.
| Prompt / Brief | Size (KB) | Tokens | Time | Reasoning Tokens |
|---|---|---|---|---|
| SaaS Landing Page (AI Observability) | 60.3 KB | 23,801 | 552 s | 836 |
| Analytics Dashboard (Light Theme) | 42.1 KB | 15,390 | 354 s | 1,898 |
| Designer Portfolio (Kinetic Typography) | 32.5 KB | 11,612 | 265 s | 1,459 |
| Pricing Page (3 Tiers + FAQ) | 26.6 KB | 9,360 | 213 s | 1,077 |
| Mobile App Marketing Page | 42.3 KB | 16,590 | 382 s | 1,650 |
| Sketch Name | Size (KB) | Tokens | Time | Configuration & Metrics |
|---|---|---|---|---|
| Particle Attractor (Fluid Swarm) | 9.4 KB | 4,308 | 97 s | temp 1.0 Β· 1,513 chars reasoning |
| Generative Flowfield (Ink Agents) | 13.9 KB | 7,237 | 163 s | temp 1.0 Β· 6,269 chars reasoning |
| Soft-Body Physics Sandbox | 18.0 KB | 6,827 | 154 s | temp 0.75 Β· 1,665 chars reasoning (shipped clean first run) |
| Audio-Reactive Visualizer | 10.7 KB | 5,731 | 129 s | temp 1.0 Β· 7,645 chars reasoning |
excluded-canvas/.
| Task Brief | Completion Tokens | Reasoning Characters | Time |
|---|---|---|---|
| Multi-step Planning (URL shortener deploy) | 2,238 | 7,067 | 50 s |
| Tool-use Planning (Flights, Hotel, Weather) | 1,262 | 2,807 | 28 s |
| Code Debugging (4-bug BST K-th smallest) | 1,753 | 5,225 | 39 s |
| Structured Extraction (Roster from prose) | 1,721 | 4,245 | 39 s |
| Self-Critique Loop (Palindrome optimization) | 1,255 | 3,309 | 28 s |
| Structured Extraction (No-think) | 351 | 0 (nothink) | 8 s |
- code_debug: Successfully caught all 4 bugs (sort order,
=vs==, useless loop, off-by-one errors). - self_critique: Followed the structured instruction loop (INITIAL β CRITIQUE β IMPROVED) and optimized a palindrome algorithm to O(nΒ²) expand-around-center.
- multi_step_planning: Designed a robust 10-step deployment plan with Dockerfile hand-off and explicit pip dependencies.
- tool_use_json: Resolved a 3-tool sequence (
search_flights,book_hotel,get_weather) with completely valid argument shapes.
πΊοΈ 5. Training & Data Pipeline Overview
The training process fuses Trace Inversion data augmentation with a Three-Stage Curriculum Learning pipeline. The core engineering focuses on expanding context length gradually while training on reconstructed reasoning traces to guarantee format stability.
[ πΊοΈ Trace Inversion: Reconstructing Distillation Workflow ]
A. Surrogate Model Training (Trace Inverter)
Open-source Model (GLM-5.1 / DS-V4) βββΊ Complete Reasoning Chain βββΊ [ Qwen3-235B Compression ] βββΊ Reasoning Bubbles
β β
ββββββββββββΊ [ Training ] βββββββββββ
(Base: Qwen3-4B-Instruct)
(Result: Trace-Inverter-4B)
B. Inversion Phase: Reconstructing Claude-4.7-Max
_______________________________________________________
| |
| Claude-4.7-Max API βββΊ Compressed Bubbles + Answer |
|_______________________________________________________|
β
βΌ
[ π§ Trace-Inverter-4B (Logic Reconstructor) ] βββΊ Synthetic Deep Reasoning Trace (Learnable CoT)
β
βΌ
[ π§© Data Splicing ] βββββββββββ (Original Prompt + Response)
(Embed reconstructed CoT in <think> tags, splicing with original prompt/response)
β
βΌ
(Result: claude-opus-4.6/4.7 inverted sets)
C. Final SFT Curriculum Pipeline
___________________________________________
| |
| Base Model (Qwen3.6-27B) |
|___________________________________________|
β
βΌ
[ π¦ Phase 1: Format Inception ] βββΊ [ π οΈ Phase 2: Complexity Expansion ] βββΊ [ π Phase 3: Long-Context SFT ]
( < 4096 tokens ) ( 4096 - 8192 tokens ) ( 8192 - 32K tokens )
(Short-context stable format) (Medium-complexity reasoning) (Long/Multi-turn / 10% replay)
β β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
βΌ
_____________________________________________
| |
| π Final Model: Qwopus3.6-27B-v2 |
|_____________________________________________|
π― 6. Three-Stage Curriculum Learning
To steadily scale up the reasoning quality under long-context inference, Qwopus3.6-27B-v2 adopts a Curriculum Learning strategy, progressively mixing longer and more complex reasoning templates:
| Curriculum Stage | Focus & Sample Characteristics | Strategy Details |
|---|---|---|
| π¦ Stage 1: Format Inception | β’ Limit context within 4,096 tokens β’ Emphasize stable reasoning templates |
Focuses on short-to-medium length, cleanly formatted reasoning samples. The primary goal is to establish a reliable, structured reasoning output format (such as auto-closing <think> tags), preventing premature exposure to complex chains from causing format collapse. |
| π οΈ Stage 2: Complexity Expansion | β’ Extend length to 4,096 - 8,192 tokens β’ Introduce high-difficulty logic samples |
Gradually increases the ratio of complex reasoning chains. By aligned distillation with "teacher models" whose reasoning style distributions closely match the Qwen3.6 base, the capacity gap is controlled to achieve highly efficient knowledge transfer. |
| π Stage 3: Long-Context SFT | β’ Progressively scale window up to 32K tokens β’ 10% high-quality short sample replay |
In this stage, the model is pushed to deep reasoning scenarios under ultra-long context and multi-turn dialogues. To prevent capacity drift or degradation of short-instruction comprehension during long-text training, a 10% replay of high-quality short samples is strictly enforced. |
π¨ 7. Trace Inversion Case Studies (5 Key Domains Showcase)
To demonstrate how Trace Inversion reconstructs logical continuity and eliminates negative entropy, the following interactive panels show the contrast between raw compressed "Reasoning Bubbles" and the fully step-by-step reconstructed chain-of-thought (Learnable CoT) under 5 typical scenarios:
π Domain 1: Mathematics (Probability Calculation)
Define
First Draw
Second Draw
Multiply
Simplify
π Domain 2: Physics (Kinematics)
Goal
Extract
Match Formula
Compute
Verify
π» Domain 3: Coding (Algorithm Logic)
def sum_even_numbers(arr): return sum(x for x in arr if x % 2 == 0)
Iterate
Check Even
Accumulate
Edge Cases
Complexity
π§ Domain 4: Logical Reasoning (Syllogism)
Set Definition
Objective
Venn Analysis
Counterexample
Conclusion
π‘ Domain 5: Core Theory (Reasoning Bubble vs. Learnable CoT)
Define
Entropy
Gradient
Inversion
π Context Length and Long-Context Usage
During fine-tuning, this model was trained with a maximum sequence length of 32K tokens. The training data mixture was also constructed around samples up to 32K tokens, so the "Context Length Distribution" shown in this model card reflects the fine-tuning data distribution rather than a hard architectural limit.
The model still inherits the native long-context capability of the Qwen3.6 base model. Therefore, longer context windows such as 128K or 256K may be available in compatible inference runtimes, depending on the backend and configuration.
For practical long-context inference beyond 32K, especially when using llama.cpp / GGUF, it is recommended to enable RoPE/YaRN scaling instead of only increasing n_ctx / --ctx-size. Directly setting a larger context window without RoPE scaling may work in some cases, but it can be less stable and may not achieve the expected long-context performance.
This is consistent with Qwen community guidance for long-context GGUF usage: 128K context generally requires YaRN/RoPE scaling, and it is not necessarily enabled by default in llama.cpp. For example, Qwen maintainers have noted that "128K context length needs YaRN" and that it should be explicitly enabled when supported by the runtime.
Reference: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GGUF/discussions/2
Community feedback also suggests that RoPE/YaRN scaling can improve long-context stability for this model family. One user reported that, on HermesAgent-20, Qwopus3.6-35B-A3B-v1 performed better when extending from 32K to 128K via RoPE scaling than when directly setting a 128K context window without scaling, with scores of 83 vs. 72 in their setup. This result may vary depending on the backend, quantization type, KV cache settings, hardware, and benchmark configuration, but it is consistent with the recommendation to use RoPE/YaRN scaling for contexts beyond 32K.
Example llama.cpp configuration for extending from 32K to 128K:
./llama-server \
-m model.gguf \
--ctx-size 131072 \
--rope-scaling yarn \
--rope-scale 4 \
--yarn-orig-ctx 32768
For 256K context, users may need to adjust the scaling factor accordingly and validate the result in their own workload:
./llama-server \
-m model.gguf \
--ctx-size 262144 \
--rope-scaling yarn \
--rope-scale 8 \
--yarn-orig-ctx 32768
Please note that long-context behavior may vary depending on the inference backend, quantization type, KV cache settings, available memory, and task type. For best results, users should benchmark their own target workload when using contexts beyond 32K.
π€ 8. Collaboration & Training Details
This model is a collaborative milestone achieved with hardware engineer Kyle Hessling. You can follow him on X / Twitter: @KyleHessling1 to keep up with the latest hardware infrastructure and distributed training updates. π
| Dimension | Details & Infrastructure |
|---|---|
| π₯οΈ Training Hardware | NVIDIA DGX Cluster / H100 / RTX 6000 Pro |
| βοΈ Fine-tuning Framework | Unsloth (used for highly efficient SFT of dense models and memory optimization) |
β οΈ 9. Known Training & Deployment Issues (IMPORTANT)
While the 27B dense model architecture is relatively stable, certain low-level framework compatibility issues may still surface during large-scale parameter updates and complex long-context training. It is highly recommended to monitor the following technical risk points during secondary fine-tuning and deployment:
| Module / Component | Issue & Troubleshooting Diagnostics |
|---|---|
| π Weight Merge (LoRA Merger) |
When merging LoRA adapters back into the base model, it is highly susceptible to peak memory out-of-memory (OOM) errors. Ensure the merging host has sufficient virtual memory or perform the low-precision merge on the CPU. |
| π οΈ Dependency Compatibility | PEFT, Transformers 5.x fusion mode, and Unsloth patches may occasionally cause module import failures (ImportError) or weight mapping conflicts. Please align your dependency versions with those provided in our finetuning-guide repository. |
Local Fine-Tuning & Deployment Warning: If you attempt to run secondary fine-tuning or merge adapter weights locally, please proceed with caution and be prepared to manually patch model definition files or pin dependency versions strictly.
π 10. Resources & Guides
π GitHub Repository: Jackrong-llm-finetuning-guide Access the repository to dive into the codebase and reproduce our results locally or on Google Colab.
π 11. Acknowledgements
Special thanks to:
- The Qwen team for providing the powerful Qwen3.6 base model.
- Unsloth for providing the highly efficient fine-tuning framework.
- Open-source datasets and community contributors.
- Kyle Hessling for the close collaboration on this project.
π 12. Citation
@misc{jackrong_qwopus36_27b_v2,
title = {Qwopus3.6-27B-v2},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face}
}
- Downloads last month
- 63,876
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
