ReSearch:通过强化学习让大语言模型掌握基于搜索的推理能力
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
March 25, 2025
作者: Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Fan Yang, Zenan Zhou, Weipeng Chen, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen
cs.AI
摘要
大型语言模型(LLMs)在推理方面展现了卓越的能力,OpenAI-o1和DeepSeek-R1的成功便是明证。然而,将推理与外部搜索过程相结合仍具挑战性,特别是对于需要多次检索步骤的复杂多跳问题。我们提出了ReSearch,一个新颖的框架,它通过强化学习训练LLMs进行搜索推理,而无需任何关于推理步骤的监督数据。我们的方法将搜索操作视为推理链的组成部分,其中何时及如何执行搜索由基于文本的思考引导,而搜索结果随后影响进一步的推理。我们在Qwen2.5-7B(-Instruct)和Qwen2.5-32B(-Instruct)模型上训练ReSearch,并进行了广泛的实验。尽管仅在一个数据集上训练,我们的模型在多个基准测试中展现了强大的泛化能力。分析表明,ReSearch在强化学习过程中自然激发了如反思和自我修正等高级推理能力。
English
Large Language Models (LLMs) have shown remarkable capabilities in reasoning,
exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating
reasoning with external search processes remains challenging, especially for
complex multi-hop questions requiring multiple retrieval steps. We propose
ReSearch, a novel framework that trains LLMs to Reason with Search via
reinforcement learning without using any supervised data on reasoning steps.
Our approach treats search operations as integral components of the reasoning
chain, where when and how to perform searches is guided by text-based thinking,
and search results subsequently influence further reasoning. We train ReSearch
on Qwen2.5-7B(-Instruct) and Qwen2.5-32B(-Instruct) models and conduct
extensive experiments. Despite being trained on only one dataset, our models
demonstrate strong generalizability across various benchmarks. Analysis reveals
that ReSearch naturally elicits advanced reasoning capabilities such as
reflection and self-correction during the reinforcement learning process.Summary
AI-Generated Summary