CheXagent：走向胸部X射线解读基础模型

摘要

胸部X光（CXR）是临床实践中最常见的成像检查。最近在视觉-语言基础模型（FMs）的发展方面取得的进展，使得自动化CXR解释成为可能，这可以帮助医生进行临床决策并改善患者预后。然而，开发能够准确解释CXR的FMs具有挑战性，原因在于：（1）医学图像领域中大规模视觉-语言数据集的有限可用性，（2）无法捕捉医学数据复杂性的视觉和语言编码器，以及（3）缺乏用于基准测试FMs在CXR解释方面能力的评估框架。在这项工作中，我们首先介绍了CheXinstruct - 一个从28个公开数据集中策划的大规模指令调整数据集。然后，我们提出了CheXagent - 一个能够分析和总结CXR的指令调整FM。为构建CheXagent，我们设计了一个用于解析放射学报告的临床大型语言模型（LLM），一个用于表示CXR图像的视觉编码器，以及一个用于桥接视觉和语言模态的网络。最后，我们介绍了CheXbench - 一个新颖的基准测试，旨在系统评估FMs在8个临床相关的CXR解释任务上的表现。通过与五名专家放射科医生进行广泛的定量评估和定性审查，我们证明CheXagent在CheXbench任务上优于先前开发的通用和医学领域FMs。此外，为了提高模型的透明度，我们进行了性别、种族和年龄等因素的公平性评估，以突出潜在的性能差异。我们的项目网址为https://stanford-aimi.github.io/chexagent.html。

English

Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing CheXinstruct - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present CheXagent - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce CheXbench - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at https://stanford-aimi.github.io/chexagent.html.

CheXagent：走向胸部X射线解读基础模型

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

摘要

Support