ChatPaper.aiChatPaper

MRMR:一個真實且專家級的多學科基準,專注於推理密集型多模態檢索

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval

October 10, 2025
作者: Siyue Zhang, Yuan Gao, Xiao Zhou, Yilun Zhao, Tingyu Song, Arman Cohan, Anh Tuan Luu, Chen Zhao
cs.AI

摘要

我們推出MRMR,這是首個需要深度推理的專家級多學科多模態檢索基準。MRMR包含1,502個查詢,涵蓋23個領域,所有正向文檔均經過人類專家仔細驗證。與先前基準相比,MRMR引入了三大關鍵進展。首先,它挑戰檢索系統在多個專業領域的能力,實現跨領域的細粒度模型比較。其次,查詢具有深度推理特性,例如需要深入解讀顯微鏡切片圖像等。我們進一步引入矛盾檢索這一新任務,要求模型識別相互衝突的概念。最後,查詢和文檔均構建為圖文交錯的序列。與早期僅限於單一圖像或單模態文檔的基準不同,MRMR提供了更真實的場景,包含多圖像查詢和混合模態的文檔庫。我們對4類多模態檢索系統和14個前沿模型在MRMR上進行了廣泛評估。基於LLM生成圖像描述的文本嵌入模型Qwen3-Embedding表現最佳,凸顯了多模態檢索模型的巨大改進空間。儘管最新的多模態模型如Ops-MM-Embedding在專家領域查詢上表現出色,但在需要深度推理的任務上仍有不足。我們相信MRMR為在多模態檢索領域開拓更現實且具挑戰性的場景鋪平了道路。
English
We introduce MRMR, the first expert-level multidisciplinary multimodal retrieval benchmark requiring intensive reasoning. MRMR contains 1,502 queries spanning 23 domains, with positive documents carefully verified by human experts. Compared to prior benchmarks, MRMR introduces three key advancements. First, it challenges retrieval systems across diverse areas of expertise, enabling fine-grained model comparison across domains. Second, queries are reasoning-intensive, with images requiring deeper interpretation such as diagnosing microscopic slides. We further introduce Contradiction Retrieval, a novel task requiring models to identify conflicting concepts. Finally, queries and documents are constructed as image-text interleaved sequences. Unlike earlier benchmarks restricted to single images or unimodal documents, MRMR offers a realistic setting with multi-image queries and mixed-modality corpus documents. We conduct an extensive evaluation of 4 categories of multimodal retrieval systems and 14 frontier models on MRMR. The text embedding model Qwen3-Embedding with LLM-generated image captions achieves the highest performance, highlighting substantial room for improving multimodal retrieval models. Although latest multimodal models such as Ops-MM-Embedding perform competitively on expert-domain queries, they fall short on reasoning-intensive tasks. We believe that MRMR paves the way for advancing multimodal retrieval in more realistic and challenging scenarios.
PDF72October 13, 2025