大语言模型作为可扩展的通用模拟器用于进化数字代理训练
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
October 16, 2025
作者: Yiming Wang, Da Yin, Yuedong Cui, Ruichen Zheng, Zhiqian Li, Zongyu Lin, Di Wu, Xueqing Wu, Chenchen Ye, Yu Zhou, Kai-Wei Chang
cs.AI
摘要
数字代理需要多样化、大规模的UI轨迹以泛化至现实世界任务,然而从人力标注、基础设施及工程角度而言,收集此类数据成本高昂。为此,我们推出了UI-Simulator,一种可扩展的范式,它通过生成结构化的UI状态与转换,大规模合成训练轨迹。该范式整合了数字世界模拟器以生成多样化的UI状态,采用引导式展开过程确保探索的连贯性,并通过轨迹包装器产出高质量且多样化的轨迹供代理训练。我们进一步提出UI-Simulator-Grow,一种目标导向的扩展策略,通过优先处理高影响力任务并合成信息丰富的轨迹变体,实现更快速且数据高效的扩展。在WebArena和AndroidWorld上的实验表明,尽管使用了较弱的教师模型,UI-Simulator在鲁棒性上显著优于基于真实UI训练的开源代理,甚至与之匹敌或超越。此外,UI-Simulator-Grow仅以Llama-3-8B-Instruct为基础模型,便达到了Llama-3-70B-Instruct的性能,凸显了目标导向合成扩展范式在持续高效提升数字代理能力方面的潜力。
English
Digital agents require diverse, large-scale UI trajectories to generalize
across real-world tasks, yet collecting such data is prohibitively expensive in
both human annotation, infra and engineering perspectives. To this end, we
introduce UI-Simulator, a scalable paradigm that generates
structured UI states and transitions to synthesize training trajectories at
scale. Our paradigm integrates a digital world simulator for diverse UI states,
a guided rollout process for coherent exploration, and a trajectory wrapper
that produces high-quality and diverse trajectories for agent training. We
further propose UI-Simulator-Grow, a targeted scaling strategy that
enables more rapid and data-efficient scaling by prioritizing high-impact tasks
and synthesizes informative trajectory variants. Experiments on WebArena and
AndroidWorld show that UI-Simulator rivals or surpasses open-source agents
trained on real UIs with significantly better robustness, despite using weaker
teacher models. Moreover, UI-Simulator-Grow matches the performance of
Llama-3-70B-Instruct using only Llama-3-8B-Instruct as the base model,
highlighting the potential of targeted synthesis scaling paradigm to
continuously and efficiently enhance the digital agents.