大型語言模型作為可擴展、通用型模擬器於進化數位代理訓練之應用
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
October 16, 2025
作者: Yiming Wang, Da Yin, Yuedong Cui, Ruichen Zheng, Zhiqian Li, Zongyu Lin, Di Wu, Xueqing Wu, Chenchen Ye, Yu Zhou, Kai-Wei Chang
cs.AI
摘要
數位代理需要多樣化且大規模的用戶介面(UI)軌跡來泛化於現實世界的任務,然而從人力標註、基礎設施及工程角度來看,收集此類數據的成本極高。為此,我們引入了UI-Simulator,這是一種可擴展的範式,能夠生成結構化的UI狀態與轉換,從而大規模合成訓練軌跡。該範式整合了用於多樣化UI狀態的數位世界模擬器、確保連貫探索的引導式展開過程,以及產生高質量且多樣化軌跡以供代理訓練的軌跡包裝器。我們進一步提出了UI-Simulator-Grow,這是一種目標導向的擴展策略,通過優先處理高影響力任務並合成信息豐富的軌跡變體,實現了更快速且數據高效的擴展。在WebArena與AndroidWorld上的實驗表明,儘管使用了較弱的教師模型,UI-Simulator仍能與基於真實UI訓練的開源代理相媲美甚至超越,展現出顯著更好的魯棒性。此外,UI-Simulator-Grow僅以Llama-3-8B-Instruct作為基礎模型,便達到了Llama-3-70B-Instruct的性能,凸顯了目標合成擴展範式在持續且高效提升數位代理方面的潛力。
English
Digital agents require diverse, large-scale UI trajectories to generalize
across real-world tasks, yet collecting such data is prohibitively expensive in
both human annotation, infra and engineering perspectives. To this end, we
introduce UI-Simulator, a scalable paradigm that generates
structured UI states and transitions to synthesize training trajectories at
scale. Our paradigm integrates a digital world simulator for diverse UI states,
a guided rollout process for coherent exploration, and a trajectory wrapper
that produces high-quality and diverse trajectories for agent training. We
further propose UI-Simulator-Grow, a targeted scaling strategy that
enables more rapid and data-efficient scaling by prioritizing high-impact tasks
and synthesizes informative trajectory variants. Experiments on WebArena and
AndroidWorld show that UI-Simulator rivals or surpasses open-source agents
trained on real UIs with significantly better robustness, despite using weaker
teacher models. Moreover, UI-Simulator-Grow matches the performance of
Llama-3-70B-Instruct using only Llama-3-8B-Instruct as the base model,
highlighting the potential of targeted synthesis scaling paradigm to
continuously and efficiently enhance the digital agents.