Voozh

MadriMed-VL-2B-enc

A 2B-parameter medical vision-language model, trained for medical image understanding, radiology report assistance, clinical visual question answering, and medical text reasoning.

This release introduces Dynamic LoRA Scaling, a lightweight calibration technique that reduces adapter dominance while preserving the medical knowledge learned during fine-tuning. The objective is to improve reliability, reduce hallucinations, and mitigate common diagnostic confusion patterns observed in earlier releases of madrisight/MadriMed-VL-2B.

Important: Use bfloat16 for Inference

This model was trained and calibrated using bfloat16 (BF16) precision for best performance and reproducibility on Pytorch MPS )

🚀 Quick Start

Installation

pip install transformers torch Pillow

Example image

👁 MM-1-a

Run the model directly

import torch
import re
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
from PIL import Image

BASE_MODEL_ID = "madrisight/MadriMed-VL-2B-enc"

model = Qwen3VLForConditionalGeneration.from_pretrained(
 BASE_MODEL_ID,
 device_map="cuda",
 trust_remote_code=True,
)

model.eval()

processor = AutoProcessor.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)

def load_direct_image(path: str) -> Image.Image:
 with Image.open(path) as raw:
 img = raw.convert("RGB")
 return img


# 5. Formulate the Query
prompt = """Choose the correct option for the question

Instructions:
- Analyze ONLY the provided image.
- Do NOT use external medical knowledge.
- Briefly explain the visual evidence relevant to the question.

Question: 
Examine the mammogram image shown above. Which of the following findings is most evident?

Options
A. Well-circumscribed round mass with benign features
B. Clustered microcalcifications within an area of irregular density
C. Fat-containing lesion consistent with lipoma
D. Diffuse bilateral breast edema
"""

img = load_direct_image("/content/MM-1-a.png")


messages = [
 {
 "role": "system",
 "content": "You are an expert medical AI. You must deeply analyze the question and provide the final answer."
 },
 {
 "role": "user",
 "content": [
 {"type": "image"},
 {"type": "text", "text": prompt}
 ]
 }
]
stop_token_id = processor.tokenizer.convert_tokens_to_ids("<|im_end|>")

with torch.inference_mode():

 text = processor.apply_chat_template(

 messages,
 tokenize=False,
 add_generation_prompt=True
 )

 inputs = processor(
 text=text,
 images=[img],
 return_tensors="pt",
 ).to("cuda")

 generated_ids = model.generate(
 **inputs,
 max_new_tokens=1024, # tight control (prevents drift)
 do_sample=False, # deterministic output
 pad_token_id=processor.tokenizer.pad_token_id,
 eos_token_id=stop_token_id
 )

 output_text = processor.batch_decode(
 generated_ids[:, inputs.input_ids.shape[1]:],
 skip_special_tokens=True
 )[0]


print(output_text.strip())

So, let's analyze the mammogram. The image shows a breast with some irregularities.
Looking at the options: A is about a well-circumscribed round mass, but the image doesn't show a clear mass.
B mentions clustered microcalcifications in an irregular density area. In mammograms, microcalcifications are often seen as small white spots, and irregular density might be a pattern.
C is a fat-containing lesion like a lipoma, but the image doesn't show fat density.
D is bilateral breast edema, which isn't visible here.
So B seems to fit because microcalcifications are a key finding in mammograms, especially when clustered.
</think>

B. Clustered microcalcifications within an area of irregular density

🔬 Technical Details

Training Configuration

Parameter	Value
Base model	Qwen/Qwen3-VL-2B-Thinking
Training data	medmax
Fine-tuning type	Lora SFT + GPRO
Precision	bfloat16
Hardware	Single Mac Mini (M4 Pro) with TrlMPS (https://github.com/krrish-v/trlmps)

🙏 Acknowledgments

Base model: Qwen3-VL-2B-Instruct by Alibaba Cloud
Training data: mint-medmax/medmax_data

📄 Citation

If you use this model in research, please cite:

@software{madrimedvl2b,
 title = {MadriMed-VL-2B: A Compact Multimodal Medical Vision-Language Model},
 author = {Madrisight},
 year = {2026},
 url = {https://huggingface.co/madrisight/MadriMed-VL-2B}
}

Disclaimer: This model is provided for research and educational purposes only. It is not FDA-approved, not clinically validated, and must not be used for patient care without expert human oversight. The authors assume no liability for clinical use.

Downloads last month: 228

Safetensors

Model size

2B params

Tensor type

F32

Model tree for madrisight/MadriMed-VL-2B-enc

Base model

Qwen/Qwen3-VL-2B-Thinking

Finetuned

(19)

this model

Quantizations

1 model

URL: https://huggingface.co/madrisight/MadriMed-VL-2B-enc

⇱ madrisight/MadriMed-VL-2B-enc · Hugging Face