ReSearch:透過強化學習讓大型語言模型學會基於搜索的推理
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
March 25, 2025
作者: Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Fan Yang, Zenan Zhou, Weipeng Chen, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen
cs.AI
摘要
大型語言模型(LLMs)在推理方面展現了顯著的能力,這在OpenAI-o1和DeepSeek-R1的成功中得到了體現。然而,將推理與外部搜索過程整合仍然具有挑戰性,特別是對於需要多次檢索步驟的複雜多跳問題。我們提出了ReSearch,這是一個新穎的框架,通過強化學習訓練LLMs進行搜索推理,而無需使用任何關於推理步驟的監督數據。我們的方法將搜索操作視為推理鏈的組成部分,其中何時以及如何執行搜索由基於文本的思考引導,而搜索結果隨後會影響進一步的推理。我們在Qwen2.5-7B(-Instruct)和Qwen2.5-32B(-Instruct)模型上訓練ReSearch,並進行了廣泛的實驗。儘管僅在一個數據集上進行訓練,我們的模型在各種基準測試中展現了強大的泛化能力。分析表明,ReSearch在強化學習過程中自然地激發了反思和自我修正等高級推理能力。
English
Large Language Models (LLMs) have shown remarkable capabilities in reasoning,
exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating
reasoning with external search processes remains challenging, especially for
complex multi-hop questions requiring multiple retrieval steps. We propose
ReSearch, a novel framework that trains LLMs to Reason with Search via
reinforcement learning without using any supervised data on reasoning steps.
Our approach treats search operations as integral components of the reasoning
chain, where when and how to perform searches is guided by text-based thinking,
and search results subsequently influence further reasoning. We train ReSearch
on Qwen2.5-7B(-Instruct) and Qwen2.5-32B(-Instruct) models and conduct
extensive experiments. Despite being trained on only one dataset, our models
demonstrate strong generalizability across various benchmarks. Analysis reveals
that ReSearch naturally elicits advanced reasoning capabilities such as
reflection and self-correction during the reinforcement learning process.Summary
AI-Generated Summary