WebDancer：自律的情報探索エージェントに向けて

要旨

複雑な現実世界の問題に対処するためには、深い情報探索と多段階の推論が必要です。Deep Researchに代表されるようなエージェントシステムの最近の進展は、自律的な多段階研究の可能性を強調しています。本研究では、データ中心かつトレーニング段階の観点から、エンドツーエンドのエージェント型情報探索エージェントを構築するための統合的なパラダイムを提示します。私たちのアプローチは、以下の4つの主要な段階で構成されています：(1) ブラウジングデータの構築、(2) 軌跡のサンプリング、(3) 効果的なコールドスタートのための教師ありファインチューニング、(4) 汎化能力向上のための強化学習。このフレームワークをReActに基づくウェブエージェント、WebDancerとして具体化しました。挑戦的な情報探索ベンチマークであるGAIAとWebWalkerQAでの実証評価により、WebDancerの優れた性能が示され、私たちのトレーニングパラダイムの有効性が強調されました。さらに、エージェントトレーニングの詳細な分析を通じて、より能力の高いエージェントモデルを開発するための貴重な洞察と体系的な道筋が得られました。コードとデモはhttps://github.com/Alibaba-NLP/WebAgentで公開されます。

English

Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In this work, we present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective. Our approach consists of four key stages: (1) browsing data construction, (2) trajectories sampling, (3) supervised fine-tuning for effective cold start, and (4) reinforcement learning for enhanced generalisation. We instantiate this framework in a web agent based on the ReAct, WebDancer. Empirical evaluations on the challenging information seeking benchmarks, GAIA and WebWalkerQA, demonstrate the strong performance of WebDancer, achieving considerable results and highlighting the efficacy of our training paradigm. Further analysis of agent training provides valuable insights and actionable, systematic pathways for developing more capable agentic models. The codes and demo will be released in https://github.com/Alibaba-NLP/WebAgent.

WebDancer：自律的情報探索エージェントに向けて

WebDancer: Towards Autonomous Information Seeking Agency

要旨

Support