RL-Index：用于检索索引推理的强化学习

摘要

获取外部知识对于解决现实世界任务至关重要，但当查询与其相关知识之间的关系涉及超越表面语义或词汇匹配（例如依赖于同一定理的数学问题或需要深度推理的编码任务）的隐式复杂推理时，这仍然具有挑战性。现有方法主要依赖于查询端推理（例如查询改写），这引入了显著的在线延迟，并且未充分利用在知识语料库本身（即索引端推理）上进行推理的机会。本文提出RL-Index，一种代理式索引框架，它将检索索引推理形式化为一个强化学习问题。RL-Index不进行查询时的推理，而是将推理转移到索引阶段，通过用大型语言模型生成的推理依据增强文档，这些推理依据明确编码了隐式的查询-知识关系。为了优化这些推理依据的质量，我们采用组相对策略优化（GRPO），并使用检索相似度作为可验证的奖励信号，从而能够直接优化索引决策以提高检索效果。在BRIGHT基准上的大量实验表明，RL-Index持续提升了检索和下游问答性能，同时显著降低了在线推理延迟。此外，学到的推理依据增强方法能够泛化到不同的检索器和生成器，突显了其作为跨不同检索系统的即插即用索引策略的鲁棒性。

English

Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or coding requiring deep reasoning). Existing approaches primarily rely on query-side reasoning (e.g., query rewriting), which introduces significant online latency and underutilizes the opportunity to perform reasoning over the knowledge corpus itself (i.e., index-side reasoning). In this paper, we propose RL-Index, an agentic indexing framework that formulates retrieval index reasoning as a reinforcement learning problem. Instead of performing reasoning at query time, RL-Index shifts reasoning to the indexing stage by augmenting documents with LLM-generated rationales that explicitly encode the latent query-knowledge relationship. To optimize the quality of these rationales, we employ Group Relative Policy Optimization (GRPO) and use retrieval similarity as a verifiable reward signal, enabling direct optimization of indexing decisions for retrieval effectiveness. Extensive experiments on the BRIGHT benchmark demonstrate that RL-Index consistently improves both retrieval and downstream question-answering performance, while significantly reducing online inference latency. Moreover, the learned rationale augmentation generalizes across diverse retrievers and generators, highlighting its robustness as a plug-and-play indexing strategy across different retrieval systems.