ELIXR：大規模言語モデルと放射線画像エンコーダのアラインメントによる汎用X線人工知能システムの構築

要旨

私たちのアプローチは、Embeddings for Language/Image-aligned X-Rays（ELIXR）と名付け、言語と画像を整合させた画像エンコーダを固定された大規模言語モデル（LLM）であるPaLM 2に組み合わせることで、幅広いタスクを実行します。この軽量なアダプタアーキテクチャは、MIMIC-CXRデータセットの自由記述放射線レポートと対応する画像を用いて訓練されます。ELIXRは、ゼロショット胸部X線（CXR）分類（13の所見における平均AUC 0.850）、データ効率的なCXR分類（5つの所見（無気肺、心拡大、浸潤影、胸水、肺水腫）における1％（約2,200枚）および10％（約22,000枚）の訓練データでの平均AUC 0.893および0.898）、および意味的検索（19のクエリにおける正規化割引累積ゲイン（NDCG）0.76、うち12のクエリで完全な検索を達成）において、最先端の性能を達成しました。教師ありコントラスティブ学習（SupCon）を含む既存のデータ効率的な手法と比較して、ELIXRは同様の性能を達成するために2桁少ないデータを必要としました。ELIXRはまた、CXRの視覚言語タスクにおいても有望な結果を示し、視覚的質問応答タスクで58.7％、レポート品質保証タスクで62.5％の全体精度を達成しました。これらの結果は、ELIXRがCXR AIにおいて堅牢で汎用的なアプローチであることを示唆しています。

English

Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.

ELIXR：大規模言語モデルと放射線画像エンコーダのアラインメントによる汎用X線人工知能システムの構築

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

要旨

Support