ELIXR: 대규모 언어 모델과 방사선 영상 인코더의 정렬을 통한 범용 X선 인공지능 시스템 구축

초록

우리의 접근 방식은 Embeddings for Language/Image-aligned X-Rays(ELIXR)라고 명명되었으며, 언어 정렬 이미지 인코더를 고정된 대형 언어 모델(LLM)인 PaLM 2와 결합하거나 접목시켜 다양한 작업을 수행합니다. 우리는 MIMIC-CXR 데이터셋에서 제공되는 자유 텍스트 형태의 방사선 보고서와 짝을 이루는 이미지를 사용하여 이 경량 어댑터 아키텍처를 학습시켰습니다. ELIXR은 제로샷 흉부 X선(CXR) 분류(13가지 소견에 대한 평균 AUC 0.850), 데이터 효율적 CXR 분류(5가지 소견(무기폐, 심비대, 폐경화, 흉막 삼출, 폐부종)에 대해 1%(약 2,200장) 및 10%(약 22,000장)의 학습 데이터로 각각 평균 AUC 0.893 및 0.898 달성), 그리고 의미론적 검색(19개 쿼리에 대해 정규화된 누적 할인 이득(NDCG) 0.76, 이 중 12개 쿼리에서 완벽한 검색 성능)에서 최첨단 성능을 보였습니다. 지도 대조 학습(SupCon)을 포함한 기존의 데이터 효율적 방법들과 비교했을 때, ELIXR은 유사한 성능을 달성하는 데 두 배 이상 적은 데이터를 필요로 했습니다. 또한 ELIXR은 CXR 시각-언어 작업에서도 유망한 결과를 보였으며, 시각적 질문 응답과 보고서 품질 보증 작업에서 각각 58.7%와 62.5%의 전반적인 정확도를 달성했습니다. 이러한 결과는 ELIXR이 CXR AI에 있어 견고하고 다재다능한 접근 방식임을 시사합니다.

English

Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.

ELIXR: 대규모 언어 모델과 방사선 영상 인코더의 정렬을 통한 범용 X선 인공지능 시스템 구축

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

초록

Support