VLM3D Challenge – Task 3: Self‑Supervised Multi‑Abnormality Localization¶
Welcome to Task 3 of the Vision‑Language Modeling in 3D Medical Imaging (VLM3D) Challenge. Here, systems must localize five key thoracic pathologies in 3‑D chest CT volumes—without any voxel‑level labels during training.
Contents¶
- Overview
- Dataset
- Task Objective
- Participation Rules
- Evaluation & Ranking
- Prizes & Publication
- Citation
- Contact
Overview¶
Explainable AI for chest CT demands precise localization of abnormalities such as effusions and nodules. Current 3‑D methods rely on expensive voxel‑wise annotations; this task instead promotes self‑supervised discovery:
- Enhances trust by highlighting abnormal regions for clinicians
- Scales to large datasets without manual masks
- Bootstraps downstream tasks (segmentation, radiomics)
Task 3 exploits CT‑RATE’s weak labels and a new manually annotated test set to benchmark localization under sparse supervision.
Dataset¶
Split | Patients | CT Volumes | Labels per Scan | Localization Masks | Source |
---|---|---|---|---|---|
Train | 20 000 | ≈ 47 k | binary (weak) | — | Istanbul Medipol University |
Validation | 1 304 | ≈ 3 k | binary (weak) | — | Istanbul Medipol University |
Test | 2 000 | 2 000 | hidden | ✓ (verified) | Istanbul Medipol University |
Target pathologies
- Pericardial effusion
- Pleural effusion
- Consolidation
- Ground glass opacity
- Lung nodule
Training/validation sets have only scan‑level labels; voxel masks are revealed solely for evaluation.
Task Objective¶
For each CT scan, submit five 3‑D heat‑maps (or binary masks) indicating the probable spatial extent of the target abnormalities. Maps must align with the original voxel grid of the provided nifti volume.
Participation Rules¶
- Method type: Fully automatic inference; any self‑supervision or weak supervision strategy allowed.
- Training data: CT‑RATE + any public data or models; no private voxel masks.
- Output shape: One nifti per abnormality, same spacing as input.
- Submissions: Max 1 run/day; last valid run counts.
- Organizers: Visible on leaderboard yet ineligible for prizes.
Evaluation & Ranking¶
Localization Metrics¶
Metric | What it Captures |
---|---|
Dice Similarity Coefficient | Overlap of predicted vs. reference region |
Intersection over Union (IoU) | Spatial agreement |
Hausdorff Distance (95%) | Boundary mismatch |
Sensitivity | Fraction of abnormal voxels recovered |
Scores are computed per abnormality then macro‑averaged across the five classes.
Final Ranking¶
- Two‑sided permutation tests (10 000 samples) compare every metric between teams.
- Each significant win = 1 point.
- Teams ordered by total points (ties share rank).
- Missing maps incur the minimum possible score for that case.
Prizes & Publication¶
- Awards – details TBA.
- Every team with a valid submission will be invited to co‑author the joint challenge paper (MedIA / IEEE TMI).
- An overview manuscript describing baseline results will appear on arXiv before the test phase closes.
Citation¶
@article{hamamci2024developing, title = {Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography}, author = {Hamamci, Ibrahim Ethem and Er, Sezgin and others}, journal = {arXiv preprint arXiv:2403.17834}, year = {2024} }
Contact¶
Open an issue or post on the challenge forum for technical questions. For other matters, use “Help → Email organizers” on the challenge site.