ChatPaper.aiChatPaper

面向三維醫學影像的可擴展語言-圖像預訓練研究

Towards Scalable Language-Image Pre-training for 3D Medical Imaging

May 28, 2025
作者: Chenhui Zhao, Yiwei Lyu, Asadur Chowdury, Edward Harake, Akhil Kondepudi, Akshay Rao, Xinhai Hou, Honglak Lee, Todd Hollon
cs.AI

摘要

語言-影像預訓練在二維醫學影像領域已展現出卓越性能,然而其在CT和MRI等三維模態中的應用仍受限於體積數據的高計算需求,這對大規模未經篩選的臨床研究訓練構成了顯著障礙。本研究提出了一種可擴展的三維醫學影像預訓練框架——層次化注意力語言-影像預訓練(HLIP)。HLIP借鑒了放射學數據的自然層次結構:切片、掃描和研究,採用了一種輕量級的層次化注意力機制。該機制展現出強大的泛化能力,例如,在CT-RATE數據集上預訓練後,於Rad-ChestCT基準測試中宏觀AUC提升了4.3%。此外,HLIP的計算效率使其能夠直接在未經篩選的數據集上進行訓練。通過對22萬名患者進行313萬次腦部MRI掃描及24萬名患者進行144萬次頭部CT掃描的訓練,HLIP達到了頂尖性能,如在公開可用的腦部MRI基準測試Pub-Brain-5上平衡準確率提升了32.4%;在頭部CT基準測試RSNA和CQ500上,宏觀AUC分別提升了1.4%和6.9%。這些結果表明,借助HLIP,直接在未經篩選的臨床數據集上進行預訓練,是三維醫學影像中語言-影像預訓練的一條可擴展且有效的路徑。相關代碼已公開於https://github.com/Zch0414/hlip。
English
Language-image pre-training has demonstrated strong performance in 2D medical imaging, but its success in 3D modalities such as CT and MRI remains limited due to the high computational demands of volumetric data, which pose a significant barrier to training on large-scale, uncurated clinical studies. In this study, we introduce Hierarchical attention for Language-Image Pre-training (HLIP), a scalable pre-training framework for 3D medical imaging. HLIP adopts a lightweight hierarchical attention mechanism inspired by the natural hierarchy of radiology data: slice, scan, and study. This mechanism exhibits strong generalizability, e.g., +4.3% macro AUC on the Rad-ChestCT benchmark when pre-trained on CT-RATE. Moreover, the computational efficiency of HLIP enables direct training on uncurated datasets. Trained on 220K patients with 3.13 million scans for brain MRI and 240K patients with 1.44 million scans for head CT, HLIP achieves state-of-the-art performance, e.g., +32.4% balanced ACC on the proposed publicly available brain MRI benchmark Pub-Brain-5; +1.4% and +6.9% macro AUC on head CT benchmarks RSNA and CQ500, respectively. These results demonstrate that, with HLIP, directly pre-training on uncurated clinical datasets is a scalable and effective direction for language-image pre-training in 3D medical imaging. The code is available at https://github.com/Zch0414/hlip

Summary

AI-Generated Summary

PDF12May 29, 2025