WebDancer：迈向自主信息获取智能体

摘要

应对复杂的现实世界问题，需要深入的信息检索与多步推理。近期，以深度研究为代表的智能体系统进展，凸显了自主多步研究的潜力。本研究提出了一种从数据中心化和训练阶段视角构建端到端智能信息检索代理的统一范式。我们的方法包含四个关键阶段：(1) 浏览数据构建，(2) 轨迹采样，(3) 用于有效冷启动的监督微调，以及 (4) 强化学习以提升泛化能力。我们在基于ReAct的网页代理WebDancer中实现了这一框架。在GAIA和WebWalkerQA这两个具有挑战性的信息检索基准测试中，WebDancer展现了强劲的性能，取得了显著成果，验证了我们训练范式的有效性。对智能体训练的进一步分析，为开发更强大的智能体模型提供了宝贵的洞见和系统化的可行路径。代码与演示将发布于https://github.com/Alibaba-NLP/WebAgent。

English

Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In this work, we present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective. Our approach consists of four key stages: (1) browsing data construction, (2) trajectories sampling, (3) supervised fine-tuning for effective cold start, and (4) reinforcement learning for enhanced generalisation. We instantiate this framework in a web agent based on the ReAct, WebDancer. Empirical evaluations on the challenging information seeking benchmarks, GAIA and WebWalkerQA, demonstrate the strong performance of WebDancer, achieving considerable results and highlighting the efficacy of our training paradigm. Further analysis of agent training provides valuable insights and actionable, systematic pathways for developing more capable agentic models. The codes and demo will be released in https://github.com/Alibaba-NLP/WebAgent.

WebDancer：迈向自主信息获取智能体

WebDancer: Towards Autonomous Information Seeking Agency

摘要

Support