ChatPaper.aiChatPaper

思考中的搜索与精炼:大型语言模型的自主检索增强推理

Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

May 16, 2025
作者: Yaorui Shi, Shihan Li, Chang Wu, Zhiyuan Liu, Junfeng Fang, Hengxing Cai, An Zhang, Xiang Wang
cs.AI

摘要

大型語言模型已展現出令人矚目的推理能力,但其內在知識儲備的限制不容忽視。檢索增強推理通過允許大型語言模型查詢外部資源來緩解這一限制,然而現有方法往往檢索到不相關或噪音信息,阻礙了準確推理。本文提出AutoRefine,一種採用新型“思考中搜索與精煉”範式的強化學習後訓練框架。AutoRefine在連續搜索調用之間引入了明確的知識精煉步驟,使模型在生成答案前能迭代過濾、提煉並組織證據。此外,我們利用群體相對策略優化,將定制的檢索特定獎勵與答案正確性獎勵相結合。在單跳和多跳問答基準上的實驗表明,AutoRefine顯著優於現有方法,特別是在複雜的多跳推理場景中。細緻分析顯示,AutoRefine頻繁發起更高質量的搜索,並有效整合證據。
English
Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir. Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources, but existing methods often retrieve irrelevant or noisy information, hindering accurate reasoning. In this paper, we propose AutoRefine, a reinforcement learning post-training framework that adopts a new ``search-and-refine-during-think'' paradigm. AutoRefine introduces explicit knowledge refinement steps between successive search calls, enabling the model to iteratively filter, distill, and organize evidence before generating an answer. Furthermore, we incorporate tailored retrieval-specific rewards alongside answer correctness rewards using group relative policy optimization. Experiments on single-hop and multi-hop QA benchmarks demonstrate that AutoRefine significantly outperforms existing approaches, particularly in complex, multi-hop reasoning scenarios. Detailed analysis shows that AutoRefine issues frequent, higher-quality searches and synthesizes evidence effectively.

Summary

AI-Generated Summary

PDF51May 28, 2025