WebWorld:面向网页智能体训练的大规模世界模型
WebWorld: A Large-Scale World Model for Web Agent Training
February 16, 2026
作者: Zikai Xiao, Jianhong Tu, Chuhang Zou, Yuxin Zuo, Zhi Li, Peng Wang, Bowen Yu, Fei Huang, Junyang Lin, Zuozhu Liu
cs.AI
摘要
网络智能体需要海量轨迹数据才能实现泛化,但现实世界的训练常受限于网络延迟、速率限制和安全风险。我们推出首个大规模开放网络模拟器WebWorld系列。现有模拟器仅能在封闭环境中处理数千条轨迹,而WebWorld通过可扩展数据管道实现了百万级开放网络交互训练,支持推理、多模态数据以及30步以上的长程模拟。在内在评估方面,我们提出涵盖九个维度的双重指标WebWorld-Bench,其模拟性能与Gemini-3-Pro相当。在外在评估中,基于WebWorld合成轨迹训练的Qwen3-14B在WebArena上提升9.2%,达到与GPT-4o相仿的水平。WebWorld支持高效的推理时搜索,作为世界模型的表现超越GPT-5。除网络模拟外,WebWorld还展现出对代码、图形界面及游戏领域的跨域泛化能力,为世界模型构建提供了可复现的解决方案。
English
Web agents require massive trajectories to generalize, yet real-world training is constrained by network latency, rate limits, and safety risks. We introduce WebWorld series, the first open-web simulator trained at scale. While existing simulators are restricted to closed environments with thousands of trajectories, WebWorld leverages a scalable data pipeline to train on 1M+ open-web interactions, supporting reasoning, multi-format data, and long-horizon simulations of 30+ steps. For intrinsic evaluation, we introduce WebWorld-Bench with dual metrics spanning nine dimensions, where WebWorld achieves simulation performance comparable to Gemini-3-Pro. For extrinsic evaluation, Qwen3-14B trained on WebWorld-synthesized trajectories improves by +9.2\% on WebArena, reaching performance comparable to GPT-4o. WebWorld enables effective inference-time search, outperforming GPT-5 as a world model. Beyond web simulation, WebWorld exhibits cross-domain generalization to code, GUI, and game environments, providing a replicable recipe for world model construction.