3D医用画像のためのスケーラブルな言語-画像事前学習に向けて

要旨

言語-画像事前学習は2次元医療画像において強力な性能を発揮しているが、CTやMRIなどの3次元モダリティでは、ボリュームデータの高い計算要求により、大規模で未整理の臨床研究に対する学習が大きな障壁となり、その成功は限られている。本研究では、3次元医療画像のためのスケーラブルな事前学習フレームワークである階層的注意機構を採用した言語-画像事前学習（HLIP）を提案する。HLIPは、放射線データの自然な階層（スライス、スキャン、研究）に着想を得た軽量な階層的注意機構を採用している。この機構は高い汎化性能を示し、例えばCT-RATEで事前学習を行った場合、Rad-ChestCTベンチマークで+4.3%のマクロAUCを達成した。さらに、HLIPの計算効率により、未整理のデータセットに対する直接的な学習が可能となる。脳MRIにおいて220,000人の患者と313万回のスキャン、頭部CTにおいて240,000人の患者と144万回のスキャンで学習を行ったHLIPは、提案された公開利用可能な脳MRIベンチマークPub-Brain-5で+32.4%のバランスドACCを、頭部CTベンチマークRSNAとCQ500でそれぞれ+1.4%と+6.9%のマクロAUCを達成し、最先端の性能を実現した。これらの結果は、HLIPを用いることで、未整理の臨床データセットに対する直接的な事前学習が、3次元医療画像における言語-画像事前学習のスケーラブルで効果的な方向性であることを示している。コードはhttps://github.com/Zch0414/hlipで公開されている。

English

Language-image pre-training has demonstrated strong performance in 2D medical imaging, but its success in 3D modalities such as CT and MRI remains limited due to the high computational demands of volumetric data, which pose a significant barrier to training on large-scale, uncurated clinical studies. In this study, we introduce Hierarchical attention for Language-Image Pre-training (HLIP), a scalable pre-training framework for 3D medical imaging. HLIP adopts a lightweight hierarchical attention mechanism inspired by the natural hierarchy of radiology data: slice, scan, and study. This mechanism exhibits strong generalizability, e.g., +4.3% macro AUC on the Rad-ChestCT benchmark when pre-trained on CT-RATE. Moreover, the computational efficiency of HLIP enables direct training on uncurated datasets. Trained on 220K patients with 3.13 million scans for brain MRI and 240K patients with 1.44 million scans for head CT, HLIP achieves state-of-the-art performance, e.g., +32.4% balanced ACC on the proposed publicly available brain MRI benchmark Pub-Brain-5; +1.4% and +6.9% macro AUC on head CT benchmarks RSNA and CQ500, respectively. These results demonstrate that, with HLIP, directly pre-training on uncurated clinical datasets is a scalable and effective direction for language-image pre-training in 3D medical imaging. The code is available at https://github.com/Zch0414/hlip

3D医用画像のためのスケーラブルな言語-画像事前学習に向けて

Towards Scalable Language-Image Pre-training for 3D Medical Imaging

要旨

Support