WebDancer：邁向自主信息探索的智能體

摘要

解決複雜的現實世界問題需要深入的資訊搜尋與多步驟推理。近期在代理系統領域的進展，以深度研究為例，凸顯了自主多步驟研究的潛力。在本研究中，我們從數據中心化與訓練階段的角度，提出了一個構建端到端代理資訊搜尋系統的統一框架。我們的方法包含四個關鍵階段：(1) 瀏覽數據構建，(2) 軌跡採樣，(3) 用於有效冷啟動的監督微調，以及(4) 強化學習以提升泛化能力。我們基於ReAct框架，在WebDancer這一網路代理中實現了此框架。在GAIA與WebWalkerQA這兩個具挑戰性的資訊搜尋基準測試上的實證評估顯示，WebDancer表現出色，取得了顯著成果，並驗證了我們訓練範式的有效性。對代理訓練的進一步分析提供了寶貴的見解與系統化的可行路徑，有助於開發更強大的代理模型。程式碼與演示將於https://github.com/Alibaba-NLP/WebAgent發布。

English

Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In this work, we present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective. Our approach consists of four key stages: (1) browsing data construction, (2) trajectories sampling, (3) supervised fine-tuning for effective cold start, and (4) reinforcement learning for enhanced generalisation. We instantiate this framework in a web agent based on the ReAct, WebDancer. Empirical evaluations on the challenging information seeking benchmarks, GAIA and WebWalkerQA, demonstrate the strong performance of WebDancer, achieving considerable results and highlighting the efficacy of our training paradigm. Further analysis of agent training provides valuable insights and actionable, systematic pathways for developing more capable agentic models. The codes and demo will be released in https://github.com/Alibaba-NLP/WebAgent.

WebDancer：邁向自主信息探索的智能體

WebDancer: Towards Autonomous Information Seeking Agency

摘要

Support