ChatPaper.aiChatPaper

你的嵌入模型比你想象中更聪明

Your Embedding Model is SMARTer Than You Think

May 24, 2026
作者: Jianrui Zhang, Hyun Jung Lee, Sukanta Ganguly, Tae-Eui Kam, Donghyun Kim, Yong Jae Lee
cs.AI

摘要

多模态检索高度依赖于单向量检索器,这类模型将丰富的顺序性令牌序列压缩为单一的全局表征。尽管效率可观,但它们丢弃了密集检索任务中至关重要的细粒度局部证据。多向量方法作为解决方案被提出,但严格依赖训练过程,且许多方法忽视了全局摘要表征的必要性。为此,我们提出SMART框架——一种能够解锁标准单向量模型潜在多向量能力的方案。首先证明,对池化嵌入执行标准对比训练时,通过梯度流会隐式塑造前序隐藏状态的检索几何结构。在推理阶段直接对冻结的隐藏状态应用后期交互操作后,SMART作为即插即用型升级方案,能在不同模态上持续提升性能,甚至在MMEB-V2基准上进一步改进现有最优模型。我们还揭示了SMART的卓越性能:轻量级后训练不仅节省时间与计算资源,更能显著提升视觉文档检索效果,使单向量模型超越当前最优的多向量同类模型。最终,SMART为多模态检索同时提供了高效推理增强范式与强大的微调技术。我们已在https://github.com/HanSolo9682/SMART 开源代码与权重。
English
Multimodal retrieval relies heavily on single-vector retrievers, which compress rich, sequential token sequences into one single global representation. While efficient, they discard fine-grained, local evidence critical for dense retrieval tasks. Multi-vector approaches were introduced as a solution, but they strictly require training and many ignore the necessity of a globally summarizing representation. To address this, we introduce SMART, a framework that unlocks the latent multi-vector capabilities of standard single-vector models. We first demonstrate that standard contrastive training on the pooled embedding implicitly shapes the retrieval geometry of preceding hidden states via gradient flow. By applying direct late-interaction over these frozen hidden states during inference, SMART acts as a plug-and-play upgrade that consistently improves performance across diverse modalities, improving even the state-of-the-art models further on MMEB-V2. We also reveal SMART's superior performance, as simple lightweight post-training not only saves time and compute, but also brings forth further improvement on Visual Document retrieval, allowing a single-vector model to outperform SoTA multi-vector counterparts. Ultimately, SMART offers both a highly efficient inference enhancement and a powerful finetuning technique for multimodal retrieval. We open source our code and weights at https://github.com/HanSolo9682/SMART.