Qwen3-8B-Base SFT LoRA — COIG-CQIA
LoRA adapter (r=32, α=64) over Qwen/Qwen3-8B-Base, instruction-tuned on
m-a-p/COIG-CQIA (11 hand-picked subsets, 6,654 train samples, ChatML format).
Trained on RunPod B200 in 36 minutes.
This is the SFT-only adapter. For the full SFT+DPO stack, see
JeffCheng12138/qwen3-8b-dpo-ultrafeedback-zh.
Project repo: https://github.com/tutucheng99/qwen3-sft-dpo-eval
Quick load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-8B-Base", dtype="bfloat16", device_map="cuda", trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "JeffCheng12138/qwen3-8b-sft-coig-cqia")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base", trust_remote_code=True)
Caveats
- Statistical evaluation found this SFT regressed below BASE on a 40-prompt pairwise judge (BASE wins 70%, 95% CI [0.60, 0.80]). The COIG-CQIA filter steered toward terser responses. See https://github.com/tutucheng99/qwen3-sft-dpo-eval/blob/main/docs/REPORT.md §4.1.
- DPO on top recovers the regression — the DPO adapter is the recommended artifact for actual use.
- EOS not strongly learned; deployments should post-process trailing
low-frequency tokens. See
serve/hf_serve.py:clean_trailingin the repo.
Training config
- LoRA:
r=32, α=64, dropout=0.05, target all attention + MLP linear layers - Optimizer: AdamW fused, lr
2e-4, cosine schedule, warmup 0.03 - 2 epochs, effective batch 16 (per-device 4 × accum 4), bf16, sdpa attention
- Hardware: 1× B200 (192 GB HBM3e)
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for JeffCheng12138/qwen3-8b-sft-coig-cqia
Base model
Qwen/Qwen3-8B-Base