ELIXR:通過對齊大型語言模型和放射學視覺編碼器,朝向通用X光人工智能系統
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
August 2, 2023
作者: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Attila Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden, Rory Pilgrim, Krish Eswaran, Andrew Sellergren
cs.AI
摘要
我們的方法稱為「嵌入式語言/圖像對齊X光」(Embeddings for Language/Image-aligned X-Rays,ELIXR),利用一個語言對齊的圖像編碼器結合或植入到一個固定的LLM,PaLM 2,以執行各種任務。我們使用來自MIMIC-CXR數據集的圖像配對相應的放射學報告來訓練這個輕量級適配器架構。ELIXR在零樣本胸部X光(CXR)分類(13個發現的平均AUC為0.850)、數據高效CX光分類(對於1%(約2,200張圖像)和10%(約22,000張圖像)的訓練數據,對於五個發現(肺膨脹、心臟肥大、浸潤、胸腔積液和肺水腫)的平均AUC分別為0.893和0.898)、以及語義搜索(在十九個查詢中的0.76標準化折扣累積增益(NDCG),其中有十二個查詢的完美檢索)。與現有的數據高效方法(包括監督對比學習(SupCon))相比,ELIXR需要兩個數量級更少的數據來達到類似的性能。ELIXR在CXR視覺語言任務上也表現出潛力,分別在視覺問答和報告質量保證任務上達到58.7%和62.5%的整體準確率。這些結果表明ELIXR是一種強大且多功能的CXR人工智能方法。
English
Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or
ELIXR, leverages a language-aligned image encoder combined or grafted onto a
fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight
adapter architecture using images paired with corresponding free-text radiology
reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance
on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13
findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898
across five findings (atelectasis, cardiomegaly, consolidation, pleural
effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images)
training data), and semantic search (0.76 normalized discounted cumulative gain
(NDCG) across nineteen queries, including perfect retrieval on twelve of them).
Compared to existing data-efficient methods including supervised contrastive
learning (SupCon), ELIXR required two orders of magnitude less data to reach
similar performance. ELIXR also showed promise on CXR vision-language tasks,
demonstrating overall accuracies of 58.7% and 62.5% on visual question
answering and report quality assurance tasks, respectively. These results
suggest that ELIXR is a robust and versatile approach to CXR AI.