ELIXR：通過對齊大型語言模型和放射學視覺編碼器，朝向通用X光人工智能系統

摘要

我們的方法稱為「嵌入式語言/圖像對齊X光」（Embeddings for Language/Image-aligned X-Rays，ELIXR），利用一個語言對齊的圖像編碼器結合或植入到一個固定的LLM，PaLM 2，以執行各種任務。我們使用來自MIMIC-CXR數據集的圖像配對相應的放射學報告來訓練這個輕量級適配器架構。ELIXR在零樣本胸部X光（CXR）分類（13個發現的平均AUC為0.850）、數據高效CX光分類（對於1%（約2,200張圖像）和10%（約22,000張圖像）的訓練數據，對於五個發現（肺膨脹、心臟肥大、浸潤、胸腔積液和肺水腫）的平均AUC分別為0.893和0.898）、以及語義搜索（在十九個查詢中的0.76標準化折扣累積增益（NDCG），其中有十二個查詢的完美檢索）。與現有的數據高效方法（包括監督對比學習（SupCon））相比，ELIXR需要兩個數量級更少的數據來達到類似的性能。ELIXR在CXR視覺語言任務上也表現出潛力，分別在視覺問答和報告質量保證任務上達到58.7%和62.5%的整體準確率。這些結果表明ELIXR是一種強大且多功能的CXR人工智能方法。

English

Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.

ELIXR：通過對齊大型語言模型和放射學視覺編碼器，朝向通用X光人工智能系統

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

摘要

Support