多模態下的多向量索引壓縮技術
Multi-Vector Index Compression in Any Modality
February 24, 2026
作者: Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme
cs.AI
摘要
我們研究用於跨模態延遲交互的高效多向量檢索技術。延遲交互已成為文本、圖像、視覺文檔和視頻信息檢索的主流範式,但其計算與存儲成本隨文檔長度線性增長,對圖像、視頻及音頻密集型語料庫成本高昂。為突破此限制,我們探索在固定向量預算下對多向量文檔表徵進行查詢無關的壓縮方法。我們提出四種索引壓縮技術:序列縮放、記憶標記、層次池化以及新穎的注意力引導聚類(AGC)。AGC採用注意力引導機制識別文檔語義最顯著的區域作為聚類中心,並加權標記聚合。通過在文本(BEIR)、視覺文檔(ViDoRe)和視頻(MSR-VTT、MultiVENT 2.0)檢索任務上的評估,我們發現注意力引導聚類始終優於其他參數化壓縮方法(序列縮放與記憶標記),相比非參數化層次聚類具有更靈活的索引尺寸調控能力,並在與完整未壓縮索引的對比中實現了競爭性甚至更優的性能。源代碼已開源於:github.com/hanxiangqin/omni-col-press。
English
We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage costs grow linearly with document length, making it costly for image-, video-, and audio-rich corpora. To address this limitation, we explore query-agnostic methods for compressing multi-vector document representations under a constant vector budget. We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC). AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation. Evaluating these methods on retrieval tasks spanning text (BEIR), visual-document (ViDoRe), and video (MSR-VTT, MultiVENT 2.0), we show that attention-guided clustering consistently outperforms other parameterized compression methods (sequence resizing and memory tokens), provides greater flexibility in index size than non-parametric hierarchical clustering, and achieves competitive or improved performance compared to a full, uncompressed index. The source code is available at: github.com/hanxiangqin/omni-col-press.