ScaleEnv:从零开始扩展环境合成以训练通用交互式工具使用智能体
ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training
February 6, 2026
作者: Dunwei Tu, Hongyan Hao, Hansi Yang, Yihao Chen, Yi-Kai Zhang, Zhikang Xia, Yu Yang, Yueqing Sun, Xingchen Liu, Furao Shen, Qi Gu, Hui Su, Xunliang Cai
cs.AI
摘要
训练能够适应多样化场景的通才智能体,需要依赖交互式环境进行自主探索。然而当前交互环境仍极度匮乏,现有合成方法在环境多样性与可扩展性方面存在明显局限。为解决这一难题,我们提出ScaleEnv框架,该框架能够从零开始构建完全可交互的环境与可验证任务。具体而言,ScaleEnv通过程序化测试确保环境可靠性,借助工具依赖图扩展与可执行动作验证来保证任务完整性与可解性。通过让智能体在ScaleEnv中进行探索式学习,我们在τ^2-Bench和VitaBench等未见过的多轮工具使用基准测试中实现了显著性能提升,展现出强大的泛化能力。此外,我们探究了领域数量增长与模型泛化性能的关系,通过实证证明扩展环境多样性对构建鲁棒智能体学习系统至关重要。
English
Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and existing synthesis methods suffer from significant limitations regarding environmental diversity and scalability. To address these challenges, we introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. Specifically, ScaleEnv ensures environment reliability through procedural testing, and guarantees task completeness and solvability via tool dependency graph expansion and executable action verification. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as τ^2-Bench and VitaBench, highlighting strong generalization capabilities. Furthermore, we investigate the relationship between increasing number of domains and model generalization performance, providing empirical evidence that scaling environmental diversity is critical for robust agent learning.