M3Retrieve：醫學多模態檢索基準測試

摘要

隨著檢索增強生成（Retrieval-Augmented Generation, RAG）技術的日益普及，強大的檢索模型變得比以往任何時候都更加重要。在醫療領域，結合文本與圖像資訊的多模態檢索模型，對於問答、跨模態檢索及多模態摘要等多項下游任務提供了顯著優勢，因為醫療數據往往同時包含這兩種格式。然而，目前尚無標準基準來評估這些模型在醫療環境中的表現。為填補這一空白，我們推出了M3Retrieve，一個多模態醫療檢索基準。M3Retrieve涵蓋5大領域、16個醫療專業及4項不同任務，包含超過120萬份文本文件與16.4萬筆多模態查詢，所有資料均在授權許可下收集。我們在此基準上評估了領先的多模態檢索模型，以探討不同醫療專業特有的挑戰，並理解這些挑戰對檢索性能的影響。通過發布M3Retrieve，我們旨在促進系統性評估，激發模型創新，並加速研究，以構建更強大、更可靠的多模態檢索系統應用於醫療領域。數據集及基準代碼可於此GitHub頁面獲取：https://github.com/AkashGhosh/M3Retrieve。

English

With the increasing use of RetrievalAugmented Generation (RAG), strong retrieval models have become more important than ever. In healthcare, multimodal retrieval models that combine information from both text and images offer major advantages for many downstream tasks such as question answering, cross-modal retrieval, and multimodal summarization, since medical data often includes both formats. However, there is currently no standard benchmark to evaluate how well these models perform in medical settings. To address this gap, we introduce M3Retrieve, a Multimodal Medical Retrieval Benchmark. M3Retrieve, spans 5 domains,16 medical fields, and 4 distinct tasks, with over 1.2 Million text documents and 164K multimodal queries, all collected under approved licenses. We evaluate leading multimodal retrieval models on this benchmark to explore the challenges specific to different medical specialities and to understand their impact on retrieval performance. By releasing M3Retrieve, we aim to enable systematic evaluation, foster model innovation, and accelerate research toward building more capable and reliable multimodal retrieval systems for medical applications. The dataset and the baselines code are available in this github page https://github.com/AkashGhosh/M3Retrieve.

M3Retrieve：醫學多模態檢索基準測試

M3Retrieve: Benchmarking Multimodal Retrieval for Medicine

摘要

Support