迈向可扩展的3D医学影像语言-图像预训练
Towards Scalable Language-Image Pre-training for 3D Medical Imaging
May 28, 2025
作者: Chenhui Zhao, Yiwei Lyu, Asadur Chowdury, Edward Harake, Akhil Kondepudi, Akshay Rao, Xinhai Hou, Honglak Lee, Todd Hollon
cs.AI
摘要
语言-图像预训练在二维医学影像中已展现出卓越性能,但在CT和MRI等三维模态上的应用仍受限,主要由于体数据的高计算需求,这成为在大规模、未经筛选的临床研究上进行训练的重大障碍。本研究提出了一种可扩展的三维医学影像预训练框架——层次化注意力语言-图像预训练(HLIP)。HLIP借鉴了放射学数据自然层次结构(切片、扫描、研究)的灵感,采用轻量级层次化注意力机制。该机制展现出强大的泛化能力,例如在CT-RATE上预训练后,在Rad-ChestCT基准测试中宏AUC提升了4.3%。此外,HLIP的计算效率使其能够直接在未经筛选的数据集上进行训练。通过对220,000名患者、3.13百万次脑部MRI扫描及240,000名患者、1.44百万次头部CT扫描的训练,HLIP实现了最先进的性能,如在公开可用的脑部MRI基准Pub-Brain-5上平衡准确率提升了32.4%;在头部CT基准RSNA和CQ500上,宏AUC分别提升了1.4%和6.9%。这些结果表明,利用HLIP直接在未经筛选的临床数据集上进行预训练,是三维医学影像语言-图像预训练的一个可扩展且有效的方向。代码已发布于https://github.com/Zch0414/hlip。
English
Language-image pre-training has demonstrated strong performance in 2D medical
imaging, but its success in 3D modalities such as CT and MRI remains limited
due to the high computational demands of volumetric data, which pose a
significant barrier to training on large-scale, uncurated clinical studies. In
this study, we introduce Hierarchical attention for Language-Image Pre-training
(HLIP), a scalable pre-training framework for 3D medical imaging. HLIP adopts a
lightweight hierarchical attention mechanism inspired by the natural hierarchy
of radiology data: slice, scan, and study. This mechanism exhibits strong
generalizability, e.g., +4.3% macro AUC on the Rad-ChestCT benchmark when
pre-trained on CT-RATE. Moreover, the computational efficiency of HLIP enables
direct training on uncurated datasets. Trained on 220K patients with 3.13
million scans for brain MRI and 240K patients with 1.44 million scans for head
CT, HLIP achieves state-of-the-art performance, e.g., +32.4% balanced ACC on
the proposed publicly available brain MRI benchmark Pub-Brain-5; +1.4% and
+6.9% macro AUC on head CT benchmarks RSNA and CQ500, respectively. These
results demonstrate that, with HLIP, directly pre-training on uncurated
clinical datasets is a scalable and effective direction for language-image
pre-training in 3D medical imaging. The code is available at
https://github.com/Zch0414/hlipSummary
AI-Generated Summary