ChatPaper.aiChatPaper

奇妙(小型)寻回犬及其训练方法: mxbai-edge-colbert-v0 技术报告

Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report

October 16, 2025
作者: Rikiya Takehi, Benjamin Clavié, Sean Lee, Aamir Shakir
cs.AI

摘要

在本研究中,我們推出了mxbai-edge-colbert-v0模型,其參數規模分為兩種:1700萬與3200萬。作為研究的一部分,我們進行了多項實驗,旨在提升檢索與後期交互模型的效能,並計劃將這些成果精煉至更小規模的模型中,作為概念驗證。我們的最終目標是支持從雲端大規模檢索到能在任何設備上本地運行的模型,實現全尺度的檢索能力。mxbai-edge-colbert-v0模型,我們期望它能成為未來所有實驗的堅實基礎,作為一系列小型概念驗證模型的首個版本。在mxbai-edge-colbert-v0的開發過程中,我們進行了多項消融研究,並在此報告其結果。就下游性能而言,mxbai-edge-colbert-v0是一款表現尤為出色的小型模型,在常見的短文本基準測試(BEIR)上超越了ColBERTv2,並在長上下文任務中取得了前所未有的效率進步,標誌著一大步的跨越。
English
In this work, we introduce mxbai-edge-colbert-v0 models, at two different parameter counts: 17M and 32M. As part of our research, we conduct numerous experiments to improve retrieval and late-interaction models, which we intend to distill into smaller models as proof-of-concepts. Our ultimate aim is to support retrieval at all scales, from large-scale retrieval which lives in the cloud to models that can run locally, on any device. mxbai-edge-colbert-v0 is a model that we hope will serve as a solid foundation backbone for all future experiments, representing the first version of a long series of small proof-of-concepts. As part of the development of mxbai-edge-colbert-v0, we conducted multiple ablation studies, of which we report the results. In terms of downstream performance, mxbai-edge-colbert-v0 is a particularly capable small model, outperforming ColBERTv2 on common short-text benchmarks (BEIR) and representing a large step forward in long-context tasks, with unprecedented efficiency.
PDF182December 21, 2025