ChatPaper.aiChatPaper

Trove:面向稠密检索的灵活工具包

Trove: A Flexible Toolkit for Dense Retrieval

November 3, 2025
作者: Reza Esfandiarpoor, Max Zuo, Stephen H. Bach
cs.AI

摘要

我们推出Trove——一款简单易用的开源检索工具包,在保持灵活性与速度的同时显著简化研究实验流程。该工具首次实现了高效动态数据管理功能,仅需少量代码即可实时加载并处理(筛选、选择、转换与融合)检索数据集。这使得用户能灵活尝试不同数据集配置,无需为大型数据集计算并存储多个副本。Trove具备高度可定制性:除内置多种选项外,用户可自由修改现有组件或完全替换为自定义对象。该工具还提供用于评估和难负例挖掘的低代码统一流水线,支持无需代码修改的多节点执行。Trove的数据管理功能将内存消耗降低至原来的2.6分之一,其易用的推理流水线更实现零额外开销,且推理时间随可用节点数量呈线性下降。最重要的是,我们展示了Trove如何简化检索实验并支持任意定制,从而有效推动探索性研究。
English
We introduce Trove, an easy-to-use open-source retrieval toolkit that simplifies research experiments without sacrificing flexibility or speed. For the first time, we introduce efficient data management features that load and process (filter, select, transform, and combine) retrieval datasets on the fly, with just a few lines of code. This gives users the flexibility to easily experiment with different dataset configurations without the need to compute and store multiple copies of large datasets. Trove is highly customizable: in addition to many built-in options, it allows users to freely modify existing components or replace them entirely with user-defined objects. It also provides a low-code and unified pipeline for evaluation and hard negative mining, which supports multi-node execution without any code changes. Trove's data management features reduce memory consumption by a factor of 2.6. Moreover, Trove's easy-to-use inference pipeline incurs no overhead, and inference times decrease linearly with the number of available nodes. Most importantly, we demonstrate how Trove simplifies retrieval experiments and allows for arbitrary customizations, thus facilitating exploratory research.
PDF111January 19, 2026