CheXagent：胸部X線解読のための基盤モデルに向けて

要旨

胸部X線（CXR）は、臨床現場で最も頻繁に行われる画像検査です。近年の視覚-言語基盤モデル（FM）の開発の進展により、自動化されたCXR解釈が可能となり、医師の臨床意思決定を支援し、患者のアウトカムを改善することが期待されています。しかし、CXRを正確に解釈できるFMを開発するには、(1) 医療画像領域における大規模な視覚-言語データセットの限られた可用性、(2) 医療データの複雑さを捉えることができる視覚および言語エンコーダの不足、(3) CXR解釈におけるFMの能力をベンチマークする評価フレームワークの欠如、といった課題があります。本研究では、これらの課題に対処するため、まず28の公開データセットからキュレーションされた大規模な指示チューニングデータセットであるCheXinstructを導入します。次に、CXRを分析し要約することができる指示チューニング済みFMであるCheXagentを提案します。CheXagentを構築するために、放射線レポートを解析するための臨床用大規模言語モデル（LLM）、CXR画像を表現するための視覚エンコーダ、および視覚と言語モダリティを橋渡しするネットワークを設計します。最後に、8つの臨床的に重要なCXR解釈タスクにわたってFMを体系的に評価するための新しいベンチマークであるCheXbenchを導入します。5人の専門放射線科医による詳細な定量的評価と定性的レビューにより、CheXagentがCheXbenchタスクにおいて、これまでに開発された一般領域および医療領域のFMを上回る性能を示すことが実証されました。さらに、モデルの透明性を向上させるため、性別、人種、年齢にわたる公平性評価を行い、潜在的な性能の差異を明らかにしました。本プロジェクトの詳細はhttps://stanford-aimi.github.io/chexagent.htmlに掲載されています。

English

Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing CheXinstruct - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present CheXagent - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce CheXbench - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at https://stanford-aimi.github.io/chexagent.html.

CheXagent：胸部X線解読のための基盤モデルに向けて

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

要旨

Support