WebThinker: 深層研究能力を備えた大規模推論モデルの強化

要旨

OpenAI-o1やDeepSeek-R1などの大規模推論モデル（LRM）は、長期的な推論能力において優れた性能を発揮します。しかし、これらのモデルは静的な内部知識に依存しているため、複雑で知識集約的なタスクにおける性能が制限され、多様なウェブ情報を統合した包括的な研究レポートの作成能力が妨げられています。この問題を解決するため、我々はWebThinkerを提案します。これは、LRMが推論プロセス中に自律的にウェブを検索し、ウェブページをナビゲートし、研究レポートを起草することを可能にする深層研究エージェントです。WebThinkerは、Deep Web Explorerモジュールを統合しており、LRMが知識のギャップに遭遇した際に、動的にウェブを検索、ナビゲート、情報抽出することを可能にします。また、Autonomous Think-Search-and-Draft戦略を採用し、モデルがリアルタイムで推論、情報収集、レポート作成をシームレスに交互に行うことを可能にします。さらに、研究ツールの活用を強化するために、反復的なオンラインDirect Preference Optimization（DPO）によるRLベースのトレーニング戦略を導入します。複雑な推論ベンチマーク（GPQA、GAIA、WebWalkerQA、HLE）および科学レポート生成タスク（Glaive）における広範な実験により、WebThinkerが既存の手法や強力なプロプライエタリシステムを大幅に上回ることを実証しました。我々のアプローチは、LRMの信頼性と複雑なシナリオにおける適用性を向上させ、より有能で汎用性の高い深層研究システムへの道を開きます。コードはhttps://github.com/RUC-NLPIR/WebThinkerで公開されています。

English

Large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, demonstrate impressive long-horizon reasoning capabilities. However, their reliance on static internal knowledge limits their performance on complex, knowledge-intensive tasks and hinders their ability to produce comprehensive research reports requiring synthesis of diverse web information. To address this, we propose WebThinker, a deep research agent that empowers LRMs to autonomously search the web, navigate web pages, and draft research reports during the reasoning process. WebThinker integrates a Deep Web Explorer module, enabling LRMs to dynamically search, navigate, and extract information from the web when encountering knowledge gaps. It also employs an Autonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time. To further enhance research tool utilization, we introduce an RL-based training strategy via iterative online Direct Preference Optimization (DPO). Extensive experiments on complex reasoning benchmarks (GPQA, GAIA, WebWalkerQA, HLE) and scientific report generation tasks (Glaive) demonstrate that WebThinker significantly outperforms existing methods and strong proprietary systems. Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems. The code is available at https://github.com/RUC-NLPIR/WebThinker.

WebThinker: 深層研究能力を備えた大規模推論モデルの強化

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

要旨

Support