VOOZH about

URL: https://huggingface.co/XinyuanWang/qwen3-8b-medical-lora

⇱ XinyuanWang/qwen3-8b-medical-lora · Hugging Face


Medical LLM Fine-tuning & RAG Pipeline (Qwen3-8B)

Fine-tuning Qwen3-8B on medical domain data using QLoRA, with a Retrieval-Augmented Generation (RAG) system built on top for knowledge-grounded medical question answering.

Model on Hugging Face: XinyuanWang/qwen3-8b-medical-lora


Overview

This project consists of two main components:

  1. QLoRA Fine-tuning — Qwen3-8B is fine-tuned on medical Q&A data using 4-bit NF4 quantization and LoRA adapters, reducing GPU memory requirements while preserving model quality.

  2. RAG Pipeline — A FAISS-based retrieval system indexes 125,847 chunks from 18 classic medical textbooks (MedRAG/textbooks). At inference time, relevant passages are retrieved and injected into the prompt before generation.


Results

Metric Value
Final training loss 0.8553
Final eval loss 0.9455
Train/eval gap 0.091 (~10%, no overfitting)
LoRA adapter size 174 MB (vs. 16 GB full model)
RAG knowledge base 125,847 chunks, 18 textbooks
Retrieval speed ~145 docs/sec (BGE-M3 on GPU)

Setup

Requirements

# Fine-tuning
cd LLaMA-Factory && pip install -e ".[torch,bitsandbytes]"

# RAG + inference
pip install torch transformers peft bitsandbytes accelerate
pip install sentence-transformers faiss-cpu datasets

1. Prepare Training Data

python scripts/download_dataset.py # downloads 1,000 samples from OpenMed/Medical-Reasoning-SFT-Mega
python scripts/prepare_dataset.py # converts to Alpaca format → data/medical_train.json

2. Fine-tune the Model

python scripts/download_base_model.py # downloads Qwen3-8B locally
cp configs/dataset_info.json LLaMA-Factory/data/
cd LLaMA-Factory
llamafactory-cli train ../configs/train_medical_ft.yaml # output: saves/qwen3-8b-med-lora/

3. Build the RAG Index

python scripts/download_medrag_textbooks.py # downloads MedRAG/textbooks → data/medrag_textbooks.jsonl
python scripts/build_rag_index.py # builds FAISS index → rag_index/

4. Run Inference

# Interactive mode
python scripts/rag_inference.py --interactive

# Single query
python scripts/rag_inference.py --query "What are the symptoms of appendicitis?"

# Batch inference (50 samples, saves to data/ragas_input.json)
python scripts/run_rag_inference.py

Model Details

Base model: Qwen/Qwen3-8B

Fine-tuning configuration:

Parameter Value
Method QLoRA (SFT)
Quantization 4-bit NF4 + double quantization
LoRA rank / alpha 16 / 32
LoRA dropout 0.05
Target modules q/k/v/o_proj, gate/up/down_proj
Epochs 3
Effective batch size 8 (2 × 4 gradient accumulation)
Learning rate 5e-5 (cosine, 10% warmup)
Sequence length 512 tokens (packing enabled)
Precision bfloat16
Framework LLaMA-Factory

Training data: OpenMed/Medical-Reasoning-SFT-Mega — 1,000 samples, 4,966 Alpaca records after conversion.


RAG Details

Knowledge base: MedRAG/textbooks — 18 medical textbooks including Harrison's Internal Medicine, Schwartz's Surgery, Adams' Neurology, Katzung Pharmacology, Robbins Pathology, and more.

Pipeline:

Query → BGE-M3 Embedding → FAISS Top-5 Retrieval → Prompt → Qwen3-8B → Answer
Component Choice
Embedding model BAAI/bge-m3 (1024-dim)
Index type faiss.IndexFlatIP (exact cosine search)
Retrieval threshold 0.45 cosine similarity
Generation 4-bit NF4, greedy decoding, enable_thinking=False

Repository Structure

medical-llm-finetune/
├── configs/
│ ├── train_medical_ft.yaml # Training configuration
│ └── dataset_info.json # LLaMA-Factory dataset registry
├── data/
│ ├── medical_reasoning_1k.json # Raw downloaded samples
│ ├── medical_train.json # Alpaca format training set
│ └── ragas_input.json # Batch inference results (50 samples)
├── saves/
│ └── qwen3-8b-med-lora/ # LoRA adapter + tokenizer
├── rag_index/ # FAISS index + chunk metadata (local only)
├── models/ # Base model weights (local only)
└── scripts/
 ├── download_dataset.py
 ├── prepare_dataset.py
 ├── download_base_model.py
 ├── download_medrag_textbooks.py
 ├── build_rag_index.py
 ├── rag_inference.py
 └── run_rag_inference.py

References

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XinyuanWang/qwen3-8b-medical-lora

Finetuned
Qwen/Qwen3-8B
Adapter
(1448)
this model

Dataset used to train XinyuanWang/qwen3-8b-medical-lora