Join

VLM3D Challenge – Task 3: Self‑Supervised Multi‑Abnormality Localization¶

Welcome to Task 3 of the Vision‑Language Modeling in 3D Medical Imaging (VLM3D) Challenge. Here, systems must localize five key thoracic pathologies in 3‑D chest CT volumes—without any voxel‑level labels during training.

Overview¶

Explainable AI for chest CT demands precise localization of abnormalities such as effusions and nodules. Current 3D methods rely on expensive voxel‑wise annotations; this task instead promotes self‑supervised discovery:

Enhances trust by highlighting abnormal regions for clinicians
Scales to large datasets without manual masks
Bootstraps downstream tasks (segmentation, radiomics)

Task 3 exploits CT‑RATE’s weak labels and a new manually annotated test set to benchmark localization under sparse supervision.

Dataset¶

Split	Patients	CT Volumes	Labels per Scan	Localization Masks	Source
Train	20 000	≈ 47 k	binary (weak)	—	Istanbul Medipol University
Validation	1 304	≈ 3 k	binary (weak)	—	Istanbul Medipol University
Test	2 000	2 000	hidden	✓ (verified)	Istanbul Medipol University

Target pathologies

Pericardial effusion
Pleural effusion
Consolidation
Ground glass opacity
Lung nodule

Training/validation sets have only scan‑level labels; voxel masks are revealed solely for evaluation.

Task Objective¶

For each CT scan, submit five 3D heat‑maps (or binary masks) indicating the probable spatial extent of the target abnormalities. Maps must align with the original voxel grid of the provided nifti volume.

Participation Rules¶

Method type: Fully automatic inference; any self‑supervision or weak supervision strategy allowed.
Training data: CT‑RATE + any public data or models; no private voxel masks.
Output shape: One nifti per abnormality, same spacing as input.
Submissions: Max 1 run/day; last valid run counts.
Organizers: Visible on leaderboard yet ineligible for prizes.

Evaluation & Ranking¶

Localization Metrics — Bounding Box Focus¶

Metric	What it Captures
Intersection over Union (IoU)	Spatial overlap between the predicted and ground-truth boxes; cornerstone metric for bbox evaluation.
Center-to-Center Distance	Positional accuracy — Euclidean distance between box centroids, giving a robust measure for tiny or elongated findings where small shifts strongly affect IoU.
FROC Sensitivity @ ≤ k FP/Image	Clinical usefulness — proportion of lesions whose boxes reach an acceptable IoU while holding the false-positive budget below k per scan.

Scores are computed per abnormality then macro‑averaged across the five classes.

Final Ranking¶

Two‑sided permutation tests (10 000 samples) compare every metric between teams.
Each significant win = 1 point.
Teams ordered by total points (ties share rank).
Missing maps incur the minimum possible score for that case.

Prizes & Publication¶

Awards – details TBA.
Every team with a valid submission will be invited to co‑author the joint challenge paper (MedIA / IEEE TMI).
An overview manuscript describing baseline results will appear on arXiv before the test phase closes.

Citation¶

@article{hamamci2024developing,
  title   = {Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography},
  author  = {Hamamci, Ibrahim Ethem and Er, Sezgin and others},
  journal = {arXiv preprint arXiv:2403.17834},
  year    = {2024}
}

Contact¶

Open an issue or post on the challenge forum for technical questions. For other matters, use “Help → Email organizers” on the challenge site.

VLM3D Challenge – Task 3: Self‑Supervised Multi‑Abnormality Localization¶