ChatPaper.aiChatPaper

支柱-0:放射学基础模型的新前沿

Pillar-0: A New Frontier for Radiology Foundation Models

November 21, 2025
作者: Kumar Krishna Agrawal, Longchao Liu, Long Lian, Michael Nercessian, Natalia Harguindeguy, Yufu Wu, Peter Mikhael, Gigin Lin, Lecia V. Sequist, Florian Fintelmann, Trevor Darrell, Yutong Bai, Maggie Chung, Adam Yala
cs.AI

摘要

尽管放射学在现代医学中扮演着关键角色,但影像检查量的增长速度已远超放射科医师队伍的增长。基础模型为辅助完成各类放射学任务提供了可行路径,但现有医学模型仍存在局限:将三维CT和MRI视为低分辨率二维切片处理、丢弃关键灰度对比信息,且缺乏反映真实临床场景的评估框架。我们推出Pillar-0放射学基础模型——基于某大型学术中心的42,990例盆腹部CT、86,411例胸部CT、14,348例头部CT及11,543例乳腺MRI预训练而成,同时提出RATE框架,能利用大语言模型以近100%准确率提取366种放射学征象的结构化标签。在包含14,230例盆腹部CT、10,646例胸部CT、4,906例头部CT及1,585例乳腺MRI的内部测试集上,Pillar-0创下性能新纪录,平均AUROC分别达86.4、88.0、90.1和82.9,以7.8-15.8个AUROC分值优势超越MedGemma(谷歌)、MedImageInsight(微软)、灵枢(阿里巴巴)及Merlin(斯坦福),并在87.2%(319/366)的任务中位列第一。在斯坦福腹部CT数据集的外部验证中,Pillar-0同样以82.2对80.6的AUROC值超越所有基线模型。该模型还能胜任预训练范围之外的任务,例如在肺癌长期风险预测中,其于NLST数据集上较当前最优模型Sybil提升3.0个C指数,在MGH和CGMH数据集分别实现5.9和1.9的泛化增益;在脑出血检测任务中,仅需次优基线模型1/20的数据量即可获得超过95的AUROC。Pillar-0与RATE共同构建了开放、临床严谨的高性能放射学系统基础,突破了以往因算力、数据及评估限制难以实现的应用场景。
English
Radiology plays an integral role in modern medicine, yet rising imaging volumes have far outpaced workforce growth. Foundation models offer a path toward assisting with the full spectrum of radiology tasks, but existing medical models remain limited: they process volumetric CT and MRI as low-fidelity 2D slices, discard critical grayscale contrast information, and lack evaluation frameworks that reflect real clinical practice. We introduce Pillar-0, a radiology foundation model pretrained on 42,990 abdomen-pelvis CTs, 86,411 chest CTs, 14,348 head CTs, and 11,543 breast MRIs from a large academic center, together with RATE, a scalable framework that extracts structured labels for 366 radiologic findings with near-perfect accuracy using LLMs. Across internal test sets of 14,230 abdomen-pelvis CTs, 10,646 chest CTs, 4,906 head CTs, and 1,585 breast MRIs, Pillar-0 establishes a new performance frontier, achieving mean AUROCs of 86.4, 88.0, 90.1, and 82.9, outperforming MedGemma (Google), MedImageInsight (Microsoft), Lingshu (Alibaba), and Merlin (Stanford) by 7.8-15.8 AUROC points and ranking best in 87.2\% (319/366) tasks. Pillar-0 similarly outperforms all baselines in an external validation on the Stanford Abdominal CT dataset, including Merlin (82.2 vs 80.6 AUROC). Pillar-0 extends to tasks beyond its pretraining, such as long-horizon lung cancer risk prediction, where it improves upon the state-of-the-art Sybil by 3.0 C-index points on NLST, and generalizes with gains of 5.9 (MGH) and 1.9 (CGMH). In brain hemorrhage detection, Pillar-0 obtained a >95 AUROC when using only 1/20th of the data of the next most sample efficient baseline. Pillar-0 and RATE together provide an open, clinically rigorous foundation for building high-performance radiology systems, enabling applications that were previously infeasible due to computational, data, and evaluation constraints.
PDF222February 7, 2026