๐ Qwen3.5-2B-ReMix (Reasoning Mix) ๐ง
This repository contains a fully merged, native Float16 (F16) fine-tune of Qwen/Qwen3.5-2B ๐ค. The primary objective of this model is to significantly scale up performance on complex reasoning tasks, specifically targeting advanced mathematics ๐งฎ, logical deduction, and structured coding problems ๐ป.
By leveraging multi-source open-source distillation data, it aims to achieve "frontier-style" reasoning capabilities while keeping the footprint compact enough to run smoothly at native speeds on local, everyday consumer hardware ๐ without the need for external adapters.
๐ Model Highlights
- ๐๏ธ Base Architecture: Qwen/Qwen3.5-2B (Dense, Hybrid Gated DeltaNet)
- ๐พ Precision format: Native Float16 (F16) Merged Weights โ No adapter required!
- ๐ฏ Main Goal: Advanced mathematical reasoning and complex code generation/debugging.
- ๐ก๏ธ Data Origin: 100% open-source distilled reasoning datasets natively hosted on Hugging Face. No proprietary data or closed APIs (OpenAI, Anthropic, Google) were used or involved in the collection or training process.
- โก Target Environment: Local, high-efficiency edge execution with minimal hardware requirements.
๐๏ธ Recommended Generation Parameters
Depending on your use case, we recommend switching between "Everyday" and "Deep Reasoning" profiles to get the best performance out of the 2B architecture.
๐ Everyday Use (Balanced)
| Parameter | Value | Note |
|---|---|---|
๐ก๏ธ Temperature (temp) |
0.4 |
Provides a balance of creativity and coherence. |
๐ฏ Top K (top_k) |
30 |
Limits vocabulary to the most probable next steps. |
| ๐ Repeat Penalty | 1.1 |
Light penalty to ensure conversational flow. |
๐ง Deep Reasoning
| Parameter | Value | Note |
|---|---|---|
๐ก๏ธ Temperature (temp) |
0.0 - 0.1 |
Forced determinism for strict logical consistency. |
๐ฏ Top K (top_k) |
60 |
Wider pool for complex technical vocabulary. |
| ๐ Repeat Penalty | 1.2 |
Prevents "reasoning loops" during long chain-of-thought. |
| ๐ง enable_thinking | True | Enables reasoning mode based on qwen 3.5 model card |
๐ Training & Merge Details
The model was adapted using Parameter-Efficient Fine-Tuning (PEFT) and then compiled back into the core network layers to output clean, unified F16 weights via Unsloth.
- ๐ Training Steps: 175
- ๐ Loss Profile: Convergence floor reached ~0.58; stabilized consistently around 0.85
- ๐ Learning Rate:
4e-5 - ๐ LoRA Rank ($R$) during training:
16 - โ๏ธ LoRA Alpha ($\alpha$) during training:
32
โ ๏ธ Limitations & Risks
While this fine-tune aggressively pushes the boundaries of what a 2B parameter model can achieve locally, users should carefully account for the following behaviors:
- ๐ฎ Hallucinations: Like all highly compact models, it can confidently present false calculations or flawed code as absolute facts. Always verify outputs.
- ๐ญ Inconsistent Styles: Due to the "ReMix" nature of the training data, the model may occasionally exhibit shifting output structures or stylistic variations.
- ๐ Logic Mismatches: For extremely niche programming or high-level academic proofs, the model may occasionally produce broken syntax or reverse its logical assertions.
๐ฆ How to Use Natively
๐ Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_path = "YOUR_USERNAME/Qwen3.5-2B-ReMix"
# Load the aligned tokenizer and model weights directly
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "user", "content": "Explain the logic of a quicksort algorithm and implement it in Python."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Using Reasoning Parameters (To not overthink)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
temperature=0.1,
top_k=60,
repeat_penalty=1.2
)
Uploaded finetuned model
- Developed by: ertghiu256
- License: apache-2.0
- Finetuned from model : unsloth/Qwen3.5-2B
This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 78
Model tree for ertghiu256/Qwen3.5-2b-ReMix
Base model
Qwen/Qwen3.5-2B-Base