Phi-3 Mini Medical QLoRA Adapters This repository contains the lightweight QLoRA (Quantized Low-Rank Adaptation) adapters for Phi-3-mini-4k-instruct, fine-tuned on medical dialogues.
Fine-tuned by: Shahnawaz Alam
Description
Instead of updating all 3.8 billion parameters, this approach freezes the base model and only trains small adapter layers (~114 MB). This makes the model highly efficient to train and share, while significantly improving its performance on medical conversational tasks, patient queries, and healthcare Q&A.
Dataset
The model was fine-tuned on the MedDialog dataset, which contains real-world doctor-patient conversations. This helps the model understand medical contexts, symptomatic inquiries, and professional medical dialogue structures.
Usage
To use these adapters, you need to load the base model and then apply the adapters on top using the peft library.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# 1. Load the base model (Unsloth version used during training)
base_model_id = "unsloth/Phi-3-mini-4k-instruct"
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# 2. Load the QLoRA adapters
adapter_id = "shahnawaz-alam37/phi-3-mini-medical-qlora"
model = PeftModel.from_pretrained(base_model, adapter_id)
# 3. Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
# 4. Generate medical response
alpaca_prompt = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}
"""
inputs = tokenizer(
[
alpaca_prompt.format(
"Act as a medical assistant.", # Instruction
"I have been having a persistent headache for three days.", # Input
"", # Output - leave blank for generation
)
],
return_tensors="pt"
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)
response = tokenizer.batch_decode(outputs)[0]
print(response)
Training Details
Frameworks & Libraries
Library: Unsloth & Hugging Face transformers Method: QLoRA (Quantized Low-Rank Adaptation) via peft Hardware: Google Colab (NVIDIA T4 GPU)
Quantization Configuration
4-bit Quantization: Enabled (via bitsandbytes) Quantization Type: NF4 (NormalFloat 4-bit) Double Quantization: Enabled Compute Dtype: bfloat16
LoRA Configuration
Rank (r): 16 Alpha: 16 Dropout: 0 (None) Target Modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Training Hyperparameters
Epochs: 3 (change if you used more/less) Batch Size: 2 (per device) Gradient Accumulation Steps: 4 (Effective batch size = 8) Optimizer: AdamW ("paged_adamw_32bit") Learning Rate: 2e-4 LR Scheduler: Linear Max Sequence Length: 2048 (change if you used 4096) Warmup Ratio: 0.05 Weight Decay: 0.01
Training Procedure
The model was fine-tuned using Unsloth's highly optimized SFTTrainer (Supervised Fine-Tuning) on the MedDialog dataset. The base model was loaded in 4-bit to minimize VRAM usage, while the LoRA adapters were trained in full 16-bit precision, allowing efficient training on a single consumer-grade GPU.
Declaration & Disclaimer
Intended Use: This model is developed strictly for research, educational, and demonstration purposes in the field of Natural Language Processing (NLP) and conversational AI. It is designed to showcase how LLMs can be adapted for medical dialogue understanding.
NOT a Medical Device: This model is NOT a certified medical device, clinical decision support system, or a substitute for professional medical advice, diagnosis, or treatment. It must not be used to make real-world clinical decisions or to provide medical advice to actual patients.
Accuracy & Hallucinations: While fine-tuned on the MedDialog dataset to improve medical context, this model is still a statistical language model. It is prone to hallucinations, biases, and may generate plausible-sounding but factually incorrect, outdated, or dangerous medical information.
Data Limitations: The model's knowledge and tone are bounded by the base model (Phi-3-mini) and the MedDialog dataset. It may not accurately represent all medical specialties, diverse patient demographics, or recent medical advancements.
Authorship Declaration: This model was fine-tuned by Shahnawaz Alam as a personal research project. The underlying base model is developed by Microsoft, and the fine-tuning was facilitated by the Unsloth library. The authors assume no liability for any actions taken based on the outputs of this model.
By using this model, you explicitly agree to these terms and accept full responsibility for its use.
- PEFT 0.19.1
- Downloads last month
- 10
