トークンプーリングを使用して、最小限の性能影響でマルチベクトル検索のフットプリントを削減する

要旨

過去数年間、ColBERTを中心とするマルチベクトル検索手法が、ニューラル情報検索における人気のあるアプローチとなってきました。これらの手法は、文書レベルではなくトークンレベルで表現を保存することにより、特にドメイン外の環境で非常に強力な検索性能を示しています。ただし、関連する大量のベクトルを保存するために必要なストレージとメモリの要件は重要な欠点であり、実用的な採用を妨げています。本論文では、ベクトルの保存が必要な数を劇的に削減するためのシンプルなクラスタリングベースのトークンプーリング手法を紹介します。この手法により、ColBERTインデックスのスペースとメモリのフットプリントを50%削減でき、ほとんど検索性能の低下はありません。さらに、この手法は、ベクトル数を66%〜75%削減し、データセットの大部分で低下が5%未満に抑えられるようにすることも可能です。重要な点として、このアプローチはアーキテクチャの変更やクエリ処理の必要がなく、ColBERTのようなモデルとのインデックス作成時に簡単に導入できます。

English

Over the last few years, multi-vector retrieval methods, spearheaded by ColBERT, have become an increasingly popular approach to Neural IR. By storing representations at the token level rather than at the document level, these methods have demonstrated very strong retrieval performance, especially in out-of-domain settings. However, the storage and memory requirements necessary to store the large number of associated vectors remain an important drawback, hindering practical adoption. In this paper, we introduce a simple clustering-based token pooling approach to aggressively reduce the number of vectors that need to be stored. This method can reduce the space & memory footprint of ColBERT indexes by 50% with virtually no retrieval performance degradation. This method also allows for further reductions, reducing the vector count by 66%-to-75% , with degradation remaining below 5% on a vast majority of datasets. Importantly, this approach requires no architectural change nor query-time processing, and can be used as a simple drop-in during indexation with any ColBERT-like model.

トークンプーリングを使用して、最小限の性能影響でマルチベクトル検索のフットプリントを削減する

Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

要旨

Support