CheXagent：朝向胸部X光解讀基礎模型的方向

摘要

胸部X光攝影（CXR）是臨床實踐中最常進行的影像檢查。最近在視覺語言基礎模型（FMs）的發展方面取得的進展使得自動化CXR解釋成為可能，這可以協助醫師進行臨床決策並改善患者結果。然而，開發能夠準確解釋CXR的FMs具有挑戰性，原因在於（1）醫學影像領域中大規模視覺語言數據集的有限可用性，（2）無法捕捉醫學數據複雜性的視覺和語言編碼器，以及（3）缺乏評估框架來對FMs在CXR解釋上的能力進行基準測試。在這項工作中，我們首先通過引入CheXinstruct - 一個從28個公開數據集中精心策劃的大規模指令調整數據集來應對這些挑戰。然後，我們提出CheXagent - 一個能夠分析和總結CXR的指令調整FM。為構建CheXagent，我們設計了一個用於解析放射學報告的臨床大型語言模型（LLM），一個用於表示CXR圖像的視覺編碼器，以及一個用於橋接視覺和語言模態的網絡。最後，我們介紹CheXbench - 一個新穎的基準測試，旨在系統性地評估FMs在8個與臨床相關的CXR解釋任務上的表現。通過與五位專家放射科醫師進行廣泛的定量評估和定性評論，顯示CheXagent在CheXbench任務上優於先前開發的通用和醫學領域FMs。此外，為了提高模型的透明度，我們對性別、種族和年齡等因素進行公平性評估，以突顯潛在的性能差異。我們的項目位於https://stanford-aimi.github.io/chexagent.html。

English

Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing CheXinstruct - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present CheXagent - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce CheXbench - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at https://stanford-aimi.github.io/chexagent.html.

CheXagent：朝向胸部X光解讀基礎模型的方向

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

摘要

Support