思考中的搜索与精炼：大型语言模型的自主检索增强推理

摘要

大型語言模型已展現出令人矚目的推理能力，但其內在知識儲備的限制不容忽視。檢索增強推理通過允許大型語言模型查詢外部資源來緩解這一限制，然而現有方法往往檢索到不相關或噪音信息，阻礙了準確推理。本文提出AutoRefine，一種採用新型“思考中搜索與精煉”範式的強化學習後訓練框架。AutoRefine在連續搜索調用之間引入了明確的知識精煉步驟，使模型在生成答案前能迭代過濾、提煉並組織證據。此外，我們利用群體相對策略優化，將定制的檢索特定獎勵與答案正確性獎勵相結合。在單跳和多跳問答基準上的實驗表明，AutoRefine顯著優於現有方法，特別是在複雜的多跳推理場景中。細緻分析顯示，AutoRefine頻繁發起更高質量的搜索，並有效整合證據。

English

Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir. Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources, but existing methods often retrieve irrelevant or noisy information, hindering accurate reasoning. In this paper, we propose AutoRefine, a reinforcement learning post-training framework that adopts a new ``search-and-refine-during-think'' paradigm. AutoRefine introduces explicit knowledge refinement steps between successive search calls, enabling the model to iteratively filter, distill, and organize evidence before generating an answer. Furthermore, we incorporate tailored retrieval-specific rewards alongside answer correctness rewards using group relative policy optimization. Experiments on single-hop and multi-hop QA benchmarks demonstrate that AutoRefine significantly outperforms existing approaches, particularly in complex, multi-hop reasoning scenarios. Detailed analysis shows that AutoRefine issues frequent, higher-quality searches and synthesizes evidence effectively.

思考中的搜索与精炼：大型语言模型的自主检索增强推理

Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

摘要

Support