ELIXR：通过大型语言模型和放射学视觉编码器的对齐，实现通用X射线人工智能系统的发展

摘要

我们的方法被称为语言/图像对齐X射线嵌入，简称ELIXR，利用一个与固定的LLM，PaLM 2相结合或嫁接的语言对齐图像编码器来执行各种任务。我们使用MIMIC-CXR数据集中的图像与相应的放射学报告训练这个轻量级适配器架构。ELIXR在零样本胸部X射线（CXR）分类（13个发现的平均AUC为0.850）、数据高效CX分类（对于1%（~2,200张图像）和10%（~22,000张图像）的训练数据，atelectasis、cardiomegaly、consolidation、pleural effusion和pulmonary edema的平均AUC分别为0.893和0.898）以及语义搜索（在十九个查询中的0.76归一化折现累积增益（NDCG），其中十二个查询完美检索）方面取得了最先进的性能。与现有的数据高效方法（包括监督对比学习（SupCon））相比，ELIXR需要两个数量级更少的数据才能达到类似的性能。ELIXR在CXR视觉-语言任务上也表现出潜力，分别在视觉问题回答和报告质量保证任务上达到了58.7%和62.5%的整体准确率。这些结果表明ELIXR是一个强大而多才多艺的CXR人工智能方法。

English

Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.

ELIXR：通过大型语言模型和放射学视觉编码器的对齐，实现通用X射线人工智能系统的发展

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

摘要

Support