Models for our publication: "Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT" • 1 item • Updated
RadFinder
Links: Project page — Paper — Code — Models
Disease-Aware Vision–Language Pretraining for 3D CT
We pretrain a 3D CT vision–language model on 159k report–volume pairs with two new supervision signals: prompt-based disease labels for classification and intra-scan snippet localization for axial depth grounding. A single unified model reaches state-of-the-art retrieval on CT-RATE, competitive disease classification, and slice-level localization at 12 mm resolution.
Usage
See the GitHub repository.
Training data
- RefCT (internal): ~98k report–volume pairs from ~50k patients at a single hospital; in-house clinical data, not publicly released.
- CT-RATE (CC BY-NC-SA 4.0)
- Merlin (Stanford AIMI non-commercial research DUA)
- INSPECT (Stanford AIMI non-commercial research DUA)
Further acknowledgements
- The model and parts of the SigLIP training framework in
src/radfinderare based on SPECTRE - The text processing pipeline in
src/rateis used to create binary labels based on text reports and is based on RATE - We thank the MONAI, timm, and
Hugging Face transformers maintainers for the libraries
and all other package maintainers listed in
requirements.txt - The demo scan under
assets/demo/s0859/is cases0859from TotalSegmentator v2 (Wasserthal et al., CC-BY-4.0). - Funding, additional acknowledgements, full citations: see paper.
License
- All code is MIT (see
LICENSE) unless a file header says otherwise. Files insrc/rate/that carry a# Vendored from YalaLab/rate ... (ECL 2.0)header are derivatives of the upstream rate package and are licensed under ECL 2.0 (seeLICENSE_RATE). - RadFinder model weights are CC BY-NC-SA 4.0, see
LICENSE_MODELS.- Note: the weights are subject to the original dataset licenses. Users intending to use RadFinder in commercial settings should verify dataset and model licensing and obtain any required permissions.
Citation
If you use this code, models, or results, please cite:
@inproceedings{ging2026radfinder,
author = {Simon Ging and Philipp Arnold and Sebastian Walter and Hani Alnahas and Hannah Bast and Elmar Kotter and Jiancheng Yang and Behzad Bozorgtabar and Thomas Brox},
title = {Learning to Read Where to Look: Disease-Aware Vision--Language Pretraining for 3{D} {CT}},
booktitle = {Medical Image Computing and Computer Assisted Intervention -- {MICCAI} 2026, Strasbourg, France, September 27 -- October 1, 2026, Proceedings},
series = {Lecture Notes in Computer Science},
publisher = {Springer},
year = {2026},
note = {To appear},
}
- Downloads last month
- 129
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for lmb-freiburg/radfinder
Base model
cclaess/SPECTRE