ChatPaper.aiChatPaper

MetaEmbed:通過靈活的晚期交互實現多模態檢索在測試時的擴展

MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

September 22, 2025
作者: Zilin Xiao, Qi Ma, Mengting Gu, Chun-cheng Jason Chen, Xintao Chen, Vicente Ordonez, Vijai Mohan
cs.AI

摘要

通用多模態嵌入模型在捕捉查詢與候選項之間的語義關聯方面取得了巨大成功。然而,現有方法要麼將查詢和候選項壓縮為單一向量,可能限制了細粒度信息的表達能力,要麼生成過多向量,導致多向量檢索成本過高。在本研究中,我們提出了MetaEmbed,這是一種重新構思多模態嵌入如何在大規模下構建與交互的新框架。在訓練過程中,固定數量的可學習元標記被附加到輸入序列中。在測試時,它們的最後一層上下文表示作為緊湊而富有表現力的多向量嵌入。通過提出的Matryoshka多向量檢索訓練,MetaEmbed學會了在多個向量間按粒度組織信息。由此,我們實現了多模態檢索中的測試時規模調整,用戶可根據效率需求選擇用於索引和檢索交互的標記數量,以平衡檢索質量。在Massive多模態嵌入基準(MMEB)和視覺文檔檢索基準(ViDoRe)上的廣泛評估證實,MetaEmbed在保持對32B參數模型強健擴展性的同時,達到了最先進的檢索性能。
English
Universal multimodal embedding models have achieved great success in capturing semantic relevance between queries and candidates. However, current methods either condense queries and candidates into a single vector, potentially limiting the expressiveness for fine-grained information, or produce too many vectors that are prohibitively expensive for multi-vector retrieval. In this work, we introduce MetaEmbed, a new framework for multimodal retrieval that rethinks how multimodal embeddings are constructed and interacted with at scale. During training, a fixed number of learnable Meta Tokens are appended to the input sequence. At test-time, their last-layer contextualized representations serve as compact yet expressive multi-vector embeddings. Through the proposed Matryoshka Multi-Vector Retrieval training, MetaEmbed learns to organize information by granularity across multiple vectors. As a result, we enable test-time scaling in multimodal retrieval, where users can balance retrieval quality against efficiency demands by selecting the number of tokens used for indexing and retrieval interactions. Extensive evaluations on the Massive Multimodal Embedding Benchmark (MMEB) and the Visual Document Retrieval Benchmark (ViDoRe) confirm that MetaEmbed achieves state-of-the-art retrieval performance while scaling robustly to models with 32B parameters.
PDF72September 23, 2025