多模态下的多向量索引压缩
Multi-Vector Index Compression in Any Modality
February 24, 2026
作者: Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
cs.AI
摘要
我们研究面向任意模态的延迟交互多向量高效检索方法。延迟交互已成为文本、图像、视觉文档及视频信息检索的主流范式,但其计算与存储成本随文档长度线性增长,对富含图像、视频和音频的语料库成本高昂。为突破此限制,我们探索在固定向量预算下对多向量文档表征进行查询无关压缩的方法。我们提出四种索引压缩技术:序列缩放、记忆令牌、层次池化及新颖的注意力引导聚类(AGC)。AGC采用注意力引导机制识别文档中最具语义显著性的区域作为聚类中心,并加权令牌聚合。通过在文本(BEIR)、视觉文档(ViDoRe)和视频(MSR-VTT、MultiVENT 2.0)检索任务上的评估表明,注意力引导聚类始终优于其他参数化压缩方法(序列缩放与记忆令牌),相比非参数化层次聚类具有更灵活的索引尺寸调节能力,且与完整未压缩索引相比实现了具有竞争力或更优的性能。源代码发布于:github.com/hanxiangqin/omni-col-press。
English
We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage costs grow linearly with document length, making it costly for image-, video-, and audio-rich corpora. To address this limitation, we explore query-agnostic methods for compressing multi-vector document representations under a constant vector budget. We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC). AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation. Evaluating these methods on retrieval tasks spanning text (BEIR), visual-document (ViDoRe), and video (MSR-VTT, MultiVENT 2.0), we show that attention-guided clustering consistently outperforms other parameterized compression methods (sequence resizing and memory tokens), provides greater flexibility in index size than non-parametric hierarchical clustering, and achieves competitive or improved performance compared to a full, uncompressed index. The source code is available at: github.com/hanxiangqin/omni-col-press.