ChatPaper.aiChatPaper

AutoWebWorld:通过有限状态机合成无限可验证的Web环境

AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

February 15, 2026
作者: Yifan Wu, Yiran Peng, Yiyu Chen, Jianhao Ruan, Zijie Zhuang, Cheng Yang, Jiayi Zhang, Man Chen, Yenchi Tseng, Zhaoyang Yu, Liang Chen, Yuyao Zhai, Bang Liu, Chenglin Wu, Yuyu Luo
cs.AI

摘要

自主网页图形界面代理的性能高度依赖于其训练数据的质量与数量。然而一个根本性瓶颈始终存在:从真实网站收集交互轨迹成本高昂且难以验证。由于底层状态转换具有隐蔽性,不得不依赖不一致且成本高昂的外部验证器来评估步骤级正确性。为此,我们提出AutoWebWorld——通过将网页环境建模为有限状态机,并利用代码生成代理将FSM转化为可交互网站的新型框架。与真实网站中状态转换隐式存在不同,AutoWebWorld明确定义了所有状态、动作及转换规则。这实现了程序化验证:动作正确性可通过预定义规则检查,任务成功则由FSM图中是否抵达目标状态确认。AutoWebWorld支持全自动的搜索-验证流程,以每条轨迹仅0.04美元的成本从29个多样化网页环境中生成11,663条已验证轨迹。基于此合成数据的训练显著提升了真实场景性能:我们的70亿参数网页GUI代理在WebVoyager基准测试中仅需15步即可超越所有基线模型。此外,我们观察到明显的规模效应定律:随着合成数据量的增加,模型在WebVoyager和Online-Mind2Web基准上的表现持续提升。
English
The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden, leading to reliance on inconsistent and costly external verifiers to evaluate step-level correctness. To address this, we propose AutoWebWorld, a novel framework for synthesizing controllable and verifiable web environments by modeling them as Finite State Machines (FSMs) and use coding agents to translate FSMs into interactive websites. Unlike real websites, where state transitions are implicit, AutoWebWorld explicitly defines all states, actions, and transition rules. This enables programmatic verification: action correctness is checked against predefined rules, and task success is confirmed by reaching a goal state in the FSM graph. AutoWebWorld enables a fully automated search-and-verify pipeline, generating over 11,663 verified trajectories from 29 diverse web environments at only $0.04 per trajectory. Training on this synthetic data significantly boosts real-world performance. Our 7B Web GUI agent outperforms all baselines within 15 steps on WebVoyager. Furthermore, we observe a clear scaling law: as the synthetic data volume increases, performance on WebVoyager and Online-Mind2Web consistently improves.
PDF512March 28, 2026