Trove:面向稠密检索的灵活工具包
Trove: A Flexible Toolkit for Dense Retrieval
November 3, 2025
作者: Reza Esfandiarpoor, Max Zuo, Stephen H. Bach
cs.AI
摘要
我们推出Trove——一款易于使用的开源检索工具包,在保持灵活性与速度的同时简化研究实验。该工具首次引入高效数据管理功能,仅需几行代码即可动态加载并处理(筛选、选择、转换与合并)检索数据集。这使得用户能够灵活尝试不同数据集配置,无需计算和存储大型数据集的多个副本。Trove具备高度可定制性:除内置多种选项外,还允许用户自由修改现有组件或完全替换为自定义对象。同时提供用于评估和难负例挖掘的低代码统一流程,支持无需代码修改的多节点执行。Trove的数据管理功能将内存消耗降低至原来的2.6分之一。此外,其易用的推理流程不会产生额外开销,且推理时间随可用节点数量线性减少。最重要的是,我们展示了Trove如何简化检索实验并支持任意定制,从而推动探索性研究发展。
English
We introduce Trove, an easy-to-use open-source retrieval toolkit that
simplifies research experiments without sacrificing flexibility or speed. For
the first time, we introduce efficient data management features that load and
process (filter, select, transform, and combine) retrieval datasets on the fly,
with just a few lines of code. This gives users the flexibility to easily
experiment with different dataset configurations without the need to compute
and store multiple copies of large datasets. Trove is highly customizable: in
addition to many built-in options, it allows users to freely modify existing
components or replace them entirely with user-defined objects. It also provides
a low-code and unified pipeline for evaluation and hard negative mining, which
supports multi-node execution without any code changes. Trove's data management
features reduce memory consumption by a factor of 2.6. Moreover, Trove's
easy-to-use inference pipeline incurs no overhead, and inference times decrease
linearly with the number of available nodes. Most importantly, we demonstrate
how Trove simplifies retrieval experiments and allows for arbitrary
customizations, thus facilitating exploratory research.