CheXagent: 흉부 X-선 해석을 위한 기초 모델 구축

초록

흉부 X선(CXR)은 임상 실무에서 가장 빈번하게 시행되는 영상 검사이다. 최근 비전-언어 기반 모델(FM) 개발의 진전으로 자동화된 CXR 해석이 가능해져, 의사의 임상적 의사결정을 지원하고 환자 결과를 개선할 수 있는 가능성이 제기되었다. 그러나 CXR을 정확하게 해석할 수 있는 FM을 개발하는 것은 (1) 의료 영상 분야에서 대규모 비전-언어 데이터셋의 제한된 가용성, (2) 의료 데이터의 복잡성을 포착할 수 있는 비전 및 언어 인코더의 부재, (3) CXR 해석에 대한 FM의 능력을 벤치마킹하기 위한 평가 프레임워크의 결여로 인해 어려운 과제이다. 본 연구에서는 이러한 문제를 해결하기 위해 먼저 28개의 공개 데이터셋에서 선별된 대규모 지시 튜닝 데이터셋인 CheXinstruct를 소개한다. 이어서 CXR을 분석하고 요약할 수 있는 지시 튜닝 FM인 CheXagent를 제시한다. CheXagent를 구축하기 위해 방사선 보고서를 파싱할 수 있는 임상 대형 언어 모델(LLM), CXR 이미지를 표현할 수 있는 비전 인코더, 그리고 비전과 언어 모달리티를 연결하는 네트워크를 설계하였다. 마지막으로 8개의 임상적으로 관련된 CXR 해석 작업에 걸쳐 FM을 체계적으로 평가하기 위해 설계된 새로운 벤치마크인 CheXbench를 소개한다. 5명의 전문 방사선과 의사와의 광범위한 정량적 평가 및 질적 검토를 통해 CheXagent가 CheXbench 작업에서 이전에 개발된 일반 및 의료 분야 FM을 능가함을 입증하였다. 또한 모델 투명성을 개선하기 위해 성별, 인종 및 연령 요인에 걸친 공정성 평가를 수행하여 잠재적인 성능 차이를 강조하였다. 본 프로젝트는 https://stanford-aimi.github.io/chexagent.html에서 확인할 수 있다.

English

Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing CheXinstruct - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present CheXagent - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce CheXbench - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at https://stanford-aimi.github.io/chexagent.html.

CheXagent: 흉부 X-선 해석을 위한 기초 모델 구축

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

초록

Support