ChatPaper.aiChatPaper

你的嵌入模型比你想象的更聰明

Your Embedding Model is SMARTer Than You Think

May 24, 2026
作者: Jianrui Zhang, Hyun Jung Lee, Sukanta Ganguly, Tae-Eui Kam, Donghyun Kim, Yong Jae Lee
cs.AI

摘要

多模態檢索高度依賴於單向量檢索器,這些檢索器將豐富的序列化令牌序列壓縮為單一的全局表徵。儘管此方法效率較高,但它會丟失對密集檢索任務至關重要的細粒度局部證據。多向量方法作為解決方案被引入,但這類方法嚴格需要訓練,且許多忽略了全局歸納表徵的必要性。為了解決這一問題,我們提出了SMART框架,該框架能解鎖標準單向量模型中潛在的多向量能力。我們首先證明,基於池化表徵的標準對比訓練,通過梯度流隱式塑造了前置隱藏狀態的檢索幾何結構。在推論階段對這些凍結的隱藏狀態直接應用晚期交互,SMART作為即插即用的升級方案,能在不同模態中持續提升性能,甚至在MMEB-V2基準上進一步改進現有最先進模型。我們還揭示了SMART的卓越性能:只需輕量級的後訓練,不僅節省時間與計算資源,還能進一步提升視覺文件檢索的效果,使單向量模型超越多向量最先進方法。最終,SMART為多模態檢索同時提供了高效的推論增強技術與強大的微調技術。我們已將程式碼與權重開源於 https://github.com/HanSolo9682/SMART。
English
Multimodal retrieval relies heavily on single-vector retrievers, which compress rich, sequential token sequences into one single global representation. While efficient, they discard fine-grained, local evidence critical for dense retrieval tasks. Multi-vector approaches were introduced as a solution, but they strictly require training and many ignore the necessity of a globally summarizing representation. To address this, we introduce SMART, a framework that unlocks the latent multi-vector capabilities of standard single-vector models. We first demonstrate that standard contrastive training on the pooled embedding implicitly shapes the retrieval geometry of preceding hidden states via gradient flow. By applying direct late-interaction over these frozen hidden states during inference, SMART acts as a plug-and-play upgrade that consistently improves performance across diverse modalities, improving even the state-of-the-art models further on MMEB-V2. We also reveal SMART's superior performance, as simple lightweight post-training not only saves time and compute, but also brings forth further improvement on Visual Document retrieval, allowing a single-vector model to outperform SoTA multi-vector counterparts. Ultimately, SMART offers both a highly efficient inference enhancement and a powerful finetuning technique for multimodal retrieval. We open source our code and weights at https://github.com/HanSolo9682/SMART.