AutoWebWorld:透過有限狀態機合成無限可驗證的網路環境
AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines
February 15, 2026
作者: Yifan Wu, Yiran Peng, Yiyu Chen, Jianhao Ruan, Zijie Zhuang, Cheng Yang, Jiayi Zhang, Man Chen, Yenchi Tseng, Zhaoyang Yu, Liang Chen, Yuyao Zhai, Bang Liu, Chenglin Wu, Yuyu Luo
cs.AI
摘要
自主網路圖形使用者介面代理的效能,高度依賴其訓練資料的品質與數量。然而,一個根本性瓶頸始終存在:從真實網站收集互動軌跡的成本高昂且難以驗證。由於底層狀態轉換具有隱蔽性,導致必須依賴不一致且昂貴的外部驗證器來評估步驟正確性。為解決此問題,我們提出AutoWebWorld——透過將網路環境建模為有限狀態機,並運用編碼代理將FSM轉換為互動式網站的新型可控制可驗證網路環境合成框架。與真實網站中狀態轉換隱含的特性不同,AutoWebWorld明確定義所有狀態、操作及轉換規則,從而實現程式化驗證:操作正確性可根據預定義規則檢查,任務成功與否則透過是否抵達FSM圖中的目標狀態來確認。AutoWebWorld實現了全自動搜尋驗證流程,僅以每軌跡0.04美元的成本,從29個多元網路環境生成超過11,663條驗證軌跡。使用此合成資料進行訓練能顯著提升真實場景效能:我們的7B參數Web GUI代理在WebVoyager基準測試中,於15步內超越所有基線模型。更值得注意的是,我們觀察到明確的規模化規律:隨著合成資料量增加,代理在WebVoyager與Online-Mind2Web的效能呈現持續提升趨勢。
English
The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden, leading to reliance on inconsistent and costly external verifiers to evaluate step-level correctness. To address this, we propose AutoWebWorld, a novel framework for synthesizing controllable and verifiable web environments by modeling them as Finite State Machines (FSMs) and use coding agents to translate FSMs into interactive websites. Unlike real websites, where state transitions are implicit, AutoWebWorld explicitly defines all states, actions, and transition rules. This enables programmatic verification: action correctness is checked against predefined rules, and task success is confirmed by reaching a goal state in the FSM graph. AutoWebWorld enables a fully automated search-and-verify pipeline, generating over 11,663 verified trajectories from 29 diverse web environments at only $0.04 per trajectory. Training on this synthetic data significantly boosts real-world performance. Our 7B Web GUI agent outperforms all baselines within 15 steps on WebVoyager. Furthermore, we observe a clear scaling law: as the synthetic data volume increases, performance on WebVoyager and Online-Mind2Web consistently improves.