ChatPaper.aiChatPaper

RL-Index:用于检索索引推理的强化学习

RL-Index: Reinforcement Learning for Retrieval Index Reasoning

June 15, 2026
作者: Yongjia Lei, Nedim Lipka, Zhisheng Qi, Utkarsh Sahu, Koustava Goswami, Franck Dernoncourt, Ryan A. Rossi, Yu Wang
cs.AI

摘要

获取外部知识对于解决现实世界任务至关重要,但当查询与其相关知识之间的关系涉及超越表面语义或词汇匹配(例如依赖于同一定理的数学问题或需要深度推理的编码任务)的隐式复杂推理时,这仍然具有挑战性。现有方法主要依赖于查询端推理(例如查询改写),这引入了显著的在线延迟,并且未充分利用在知识语料库本身(即索引端推理)上进行推理的机会。本文提出RL-Index,一种代理式索引框架,它将检索索引推理形式化为一个强化学习问题。RL-Index不进行查询时的推理,而是将推理转移到索引阶段,通过用大型语言模型生成的推理依据增强文档,这些推理依据明确编码了隐式的查询-知识关系。为了优化这些推理依据的质量,我们采用组相对策略优化(GRPO),并使用检索相似度作为可验证的奖励信号,从而能够直接优化索引决策以提高检索效果。在BRIGHT基准上的大量实验表明,RL-Index持续提升了检索和下游问答性能,同时显著降低了在线推理延迟。此外,学到的推理依据增强方法能够泛化到不同的检索器和生成器,突显了其作为跨不同检索系统的即插即用索引策略的鲁棒性。
English
Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or coding requiring deep reasoning). Existing approaches primarily rely on query-side reasoning (e.g., query rewriting), which introduces significant online latency and underutilizes the opportunity to perform reasoning over the knowledge corpus itself (i.e., index-side reasoning). In this paper, we propose RL-Index, an agentic indexing framework that formulates retrieval index reasoning as a reinforcement learning problem. Instead of performing reasoning at query time, RL-Index shifts reasoning to the indexing stage by augmenting documents with LLM-generated rationales that explicitly encode the latent query-knowledge relationship. To optimize the quality of these rationales, we employ Group Relative Policy Optimization (GRPO) and use retrieval similarity as a verifiable reward signal, enabling direct optimization of indexing decisions for retrieval effectiveness. Extensive experiments on the BRIGHT benchmark demonstrate that RL-Index consistently improves both retrieval and downstream question-answering performance, while significantly reducing online inference latency. Moreover, the learned rationale augmentation generalizes across diverse retrievers and generators, highlighting its robustness as a plug-and-play indexing strategy across different retrieval systems.