ChatPaper.aiChatPaper

WebGym:通过真实任务扩展视觉网页智能体的训练环境规模

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

January 5, 2026
作者: Hao Bai, Alexey Taymanov, Tong Zhang, Aviral Kumar, Spencer Whitehead
cs.AI

摘要

我们推出WebGym——迄今为止规模最大的开源视觉网页智能体训练环境。真实网站具有非稳态和多样性特征,使得人工或小规模任务集难以支撑稳健的策略学习。WebGym包含近30万个任务,基于量规评估体系覆盖多样化的真实网站及难度等级。我们采用简易强化学习方案训练智能体:通过智能体自身交互轨迹进行训练,并以任务奖励作为学习反馈。为实现强化学习的规模化扩展,我们专门为网页智能体开发了高吞吐量异步轨迹采样系统,使WebGym的轨迹采样速度较原始实现提升4-5倍。其次,我们通过拓展任务集的广度、深度和规模,实现了持续的性能提升。在WebGym上对强基线视觉语言模型Qwen-3-VL-8B-Instruct进行微调后,其在分布外测试集上的成功率从26.2%提升至42.9%,显著优于基于GPT-4o(27.1%)和GPT-5-Thinking(29.8%)等专有模型的智能体。这一提升具有重大意义,因为与多数现有视觉网页智能体研究不同,我们的测试集完全由训练阶段未接触的网站任务构成。
English
We present WebGym, the largest-to-date open-source environment for training realistic visual web agents. Real websites are non-stationary and diverse, making artificial or small-scale task sets insufficient for robust policy learning. WebGym contains nearly 300,000 tasks with rubric-based evaluations across diverse, real-world websites and difficulty levels. We train agents with a simple reinforcement learning (RL) recipe, which trains on the agent's own interaction traces (rollouts), using task rewards as feedback to guide learning. To enable scaling RL, we speed up sampling of trajectories in WebGym by developing a high-throughput asynchronous rollout system, designed specifically for web agents. Our system achieves a 4-5x rollout speedup compared to naive implementations. Second, we scale the task set breadth, depth, and size, which results in continued performance improvement. Fine-tuning a strong base vision-language model, Qwen-3-VL-8B-Instruct, on WebGym results in an improvement in success rate on an out-of-distribution test set from 26.2% to 42.9%, significantly outperforming agents based on proprietary models such as GPT-4o and GPT-5-Thinking that achieve 27.1% and 29.8%, respectively. This improvement is substantial because our test set consists only of tasks on websites never seen during training, unlike many other prior works on training visual web agents.
PDF41January 8, 2026