ファンタスティック（スモール）レトリーバーとそのトレーニング方法： mxbai-edge-colbert-v0 技術レポート

要旨

本研究では、mxbai-edge-colbert-v0モデルを、17Mと32Mという2つの異なるパラメータ数で導入する。研究の一環として、検索および後期相互作用モデルの改善を目的とした多数の実験を行い、その結果を概念実証としてより小さなモデルに蒸留することを目指す。我々の最終的な目標は、クラウド上で動作する大規模検索から、あらゆるデバイス上でローカルに実行可能なモデルまで、あらゆる規模での検索をサポートすることである。mxbai-edge-colbert-v0は、今後のすべての実験のための堅固な基盤となることを期待して開発されたモデルであり、一連の小さな概念実証の最初のバージョンを代表するものである。mxbai-edge-colbert-v0の開発過程では、複数のアブレーションスタディを実施し、その結果を報告する。下流タスクにおける性能に関して、mxbai-edge-colbert-v0は特に優れた小型モデルであり、一般的な短文ベンチマーク（BEIR）においてColBERTv2を上回り、長文タスクにおいても前例のない効率性で大きな進歩を遂げている。

English

In this work, we introduce mxbai-edge-colbert-v0 models, at two different parameter counts: 17M and 32M. As part of our research, we conduct numerous experiments to improve retrieval and late-interaction models, which we intend to distill into smaller models as proof-of-concepts. Our ultimate aim is to support retrieval at all scales, from large-scale retrieval which lives in the cloud to models that can run locally, on any device. mxbai-edge-colbert-v0 is a model that we hope will serve as a solid foundation backbone for all future experiments, representing the first version of a long series of small proof-of-concepts. As part of the development of mxbai-edge-colbert-v0, we conducted multiple ablation studies, of which we report the results. In terms of downstream performance, mxbai-edge-colbert-v0 is a particularly capable small model, outperforming ColBERTv2 on common short-text benchmarks (BEIR) and representing a large step forward in long-context tasks, with unprecedented efficiency.

ファンタスティック（スモール）レトリーバーとそのトレーニング方法： mxbai-edge-colbert-v0 技術レポート

Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report

要旨

Support