VOOZH about

URL: https://huggingface.co/JeffCheng12138/qwen3-8b-sft-coig-cqia

⇱ JeffCheng12138/qwen3-8b-sft-coig-cqia · Hugging Face


Qwen3-8B-Base SFT LoRA — COIG-CQIA

LoRA adapter (r=32, α=64) over Qwen/Qwen3-8B-Base, instruction-tuned on m-a-p/COIG-CQIA (11 hand-picked subsets, 6,654 train samples, ChatML format). Trained on RunPod B200 in 36 minutes.

This is the SFT-only adapter. For the full SFT+DPO stack, see JeffCheng12138/qwen3-8b-dpo-ultrafeedback-zh.

Project repo: https://github.com/tutucheng99/qwen3-sft-dpo-eval

Quick load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
 "Qwen/Qwen3-8B-Base", dtype="bfloat16", device_map="cuda", trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "JeffCheng12138/qwen3-8b-sft-coig-cqia")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base", trust_remote_code=True)

Caveats

  • Statistical evaluation found this SFT regressed below BASE on a 40-prompt pairwise judge (BASE wins 70%, 95% CI [0.60, 0.80]). The COIG-CQIA filter steered toward terser responses. See https://github.com/tutucheng99/qwen3-sft-dpo-eval/blob/main/docs/REPORT.md §4.1.
  • DPO on top recovers the regression — the DPO adapter is the recommended artifact for actual use.
  • EOS not strongly learned; deployments should post-process trailing low-frequency tokens. See serve/hf_serve.py:clean_trailing in the repo.

Training config

  • LoRA: r=32, α=64, dropout=0.05, target all attention + MLP linear layers
  • Optimizer: AdamW fused, lr 2e-4, cosine schedule, warmup 0.03
  • 2 epochs, effective batch 16 (per-device 4 × accum 4), bf16, sdpa attention
  • Hardware: 1× B200 (192 GB HBM3e)
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JeffCheng12138/qwen3-8b-sft-coig-cqia

Adapter
(73)
this model

Dataset used to train JeffCheng12138/qwen3-8b-sft-coig-cqia