ChatPaper.aiChatPaper

WebThinker:赋予大型推理模型深度研究能力

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

April 30, 2025
作者: Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou
cs.AI

摘要

大型推理模型(LRMs),如OpenAI-o1和DeepSeek-R1,展现了卓越的长程推理能力。然而,它们对静态内部知识的依赖限制了其在复杂、知识密集型任务上的表现,并阻碍了其生成需要综合多样网络信息的全面研究报告的能力。为解决这一问题,我们提出了WebThinker,一个深度研究代理,它赋予LRMs在推理过程中自主搜索网络、浏览网页并起草研究报告的能力。WebThinker集成了一个深度网络探索模块,使LRMs在遇到知识缺口时能够动态搜索、导航并从网络中提取信息。它还采用了自主“思考-搜索-撰写”策略,允许模型实时无缝地交替进行推理、信息收集和报告撰写。为进一步提升研究工具的利用效率,我们通过迭代在线直接偏好优化(DPO)引入了一种基于强化学习的训练策略。在复杂推理基准测试(GPQA、GAIA、WebWalkerQA、HLE)和科学报告生成任务(Glaive)上的大量实验表明,WebThinker显著优于现有方法和强大的专有系统。我们的方法增强了LRM在复杂场景中的可靠性和适用性,为构建更强大、更通用的深度研究系统铺平了道路。代码可在https://github.com/RUC-NLPIR/WebThinker获取。
English
Large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, demonstrate impressive long-horizon reasoning capabilities. However, their reliance on static internal knowledge limits their performance on complex, knowledge-intensive tasks and hinders their ability to produce comprehensive research reports requiring synthesis of diverse web information. To address this, we propose WebThinker, a deep research agent that empowers LRMs to autonomously search the web, navigate web pages, and draft research reports during the reasoning process. WebThinker integrates a Deep Web Explorer module, enabling LRMs to dynamically search, navigate, and extract information from the web when encountering knowledge gaps. It also employs an Autonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time. To further enhance research tool utilization, we introduce an RL-based training strategy via iterative online Direct Preference Optimization (DPO). Extensive experiments on complex reasoning benchmarks (GPQA, GAIA, WebWalkerQA, HLE) and scientific report generation tasks (Glaive) demonstrate that WebThinker significantly outperforms existing methods and strong proprietary systems. Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems. The code is available at https://github.com/RUC-NLPIR/WebThinker.
PDF576May 4, 2025