Qwen3-8B-Base SFT LoRA — COIG-CQIA

LoRA adapter (r=32, α=64) over Qwen/Qwen3-8B-Base, instruction-tuned on m-a-p/COIG-CQIA (11 hand-picked subsets, 6,654 train samples, ChatML format). Trained on RunPod B200 in 36 minutes.

This is the SFT-only adapter. For the full SFT+DPO stack, see JeffCheng12138/qwen3-8b-dpo-ultrafeedback-zh.

Project repo: https://github.com/tutucheng99/qwen3-sft-dpo-eval

Quick load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
 "Qwen/Qwen3-8B-Base", dtype="bfloat16", device_map="cuda", trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "JeffCheng12138/qwen3-8b-sft-coig-cqia")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base", trust_remote_code=True)

Caveats

Statistical evaluation found this SFT regressed below BASE on a 40-prompt pairwise judge (BASE wins 70%, 95% CI [0.60, 0.80]). The COIG-CQIA filter steered toward terser responses. See https://github.com/tutucheng99/qwen3-sft-dpo-eval/blob/main/docs/REPORT.md §4.1.
DPO on top recovers the regression — the DPO adapter is the recommended artifact for actual use.
EOS not strongly learned; deployments should post-process trailing low-frequency tokens. See serve/hf_serve.py:clean_trailing in the repo.

Training config

LoRA: r=32, α=64, dropout=0.05, target all attention + MLP linear layers
Optimizer: AdamW fused, lr 2e-4, cosine schedule, warmup 0.03
2 epochs, effective batch 16 (per-device 4 × accum 4), bf16, sdpa attention
Hardware: 1× B200 (192 GB HBM3e)

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JeffCheng12138/qwen3-8b-sft-coig-cqia

Base model

Qwen/Qwen3-8B-Base

Adapter

(73)

this model

URL: https://huggingface.co/JeffCheng12138/qwen3-8b-sft-coig-cqia

⇱ JeffCheng12138/qwen3-8b-sft-coig-cqia · Hugging Face

Qwen3-8B-Base SFT LoRA — COIG-CQIA

Quick load

Caveats

Training config

Model tree for JeffCheng12138/qwen3-8b-sft-coig-cqia

Dataset used to train JeffCheng12138/qwen3-8b-sft-coig-cqia