ChatPaper.aiChatPaper

思考中的搜索与精炼:大语言模型的自主检索增强推理

Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

May 16, 2025
作者: Yaorui Shi, Shihan Li, Chang Wu, Zhiyuan Liu, Junfeng Fang, Hengxing Cai, An Zhang, Xiang Wang
cs.AI

摘要

大型语言模型已展现出卓越的推理能力,但其固有的知识储备限制了这一潜力。检索增强推理通过允许大模型查询外部资源来缓解这一局限,然而现有方法常检索到无关或噪声信息,阻碍了准确推理。本文提出AutoRefine,一种采用新型“思考中搜索与精炼”范式的强化学习后训练框架。AutoRefine在连续搜索调用之间引入了明确的知识精炼步骤,使模型能在生成答案前迭代地过滤、提炼和组织证据。此外,我们通过群体相对策略优化,结合了定制化的检索特定奖励与答案正确性奖励。在单跳和多跳问答基准测试上的实验表明,AutoRefine显著优于现有方法,尤其在复杂的多跳推理场景中表现突出。深入分析显示,AutoRefine能频繁发起更高质量的搜索,并有效综合证据。
English
Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir. Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources, but existing methods often retrieve irrelevant or noisy information, hindering accurate reasoning. In this paper, we propose AutoRefine, a reinforcement learning post-training framework that adopts a new ``search-and-refine-during-think'' paradigm. AutoRefine introduces explicit knowledge refinement steps between successive search calls, enabling the model to iteratively filter, distill, and organize evidence before generating an answer. Furthermore, we incorporate tailored retrieval-specific rewards alongside answer correctness rewards using group relative policy optimization. Experiments on single-hop and multi-hop QA benchmarks demonstrate that AutoRefine significantly outperforms existing approaches, particularly in complex, multi-hop reasoning scenarios. Detailed analysis shows that AutoRefine issues frequent, higher-quality searches and synthesizes evidence effectively.

Summary

AI-Generated Summary

PDF51May 28, 2025