ChatPaper.aiChatPaper

ELIXR:通过大型语言模型和放射学视觉编码器的对齐,实现通用X射线人工智能系统的发展

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

August 2, 2023
作者: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Attila Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden, Rory Pilgrim, Krish Eswaran, Andrew Sellergren
cs.AI

摘要

我们的方法被称为语言/图像对齐X射线嵌入,简称ELIXR,利用一个与固定的LLM,PaLM 2相结合或嫁接的语言对齐图像编码器来执行各种任务。我们使用MIMIC-CXR数据集中的图像与相应的放射学报告训练这个轻量级适配器架构。ELIXR在零样本胸部X射线(CXR)分类(13个发现的平均AUC为0.850)、数据高效CX分类(对于1%(~2,200张图像)和10%(~22,000张图像)的训练数据,atelectasis、cardiomegaly、consolidation、pleural effusion和pulmonary edema的平均AUC分别为0.893和0.898)以及语义搜索(在十九个查询中的0.76归一化折现累积增益(NDCG),其中十二个查询完美检索)方面取得了最先进的性能。与现有的数据高效方法(包括监督对比学习(SupCon))相比,ELIXR需要两个数量级更少的数据才能达到类似的性能。ELIXR在CXR视觉-语言任务上也表现出潜力,分别在视觉问题回答和报告质量保证任务上达到了58.7%和62.5%的整体准确率。这些结果表明ELIXR是一个强大而多才多艺的CXR人工智能方法。
English
Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
PDF121December 15, 2024