ScaleEnv：面向通用交互式工具使用智能体训练的環境合成擴展框架

摘要

训练能够适应多样化场景的通才智能体，需要可供自主探索的交互式环境。然而当前交互环境仍极度匮乏，且现有合成方法在环境多样性与可扩展性方面存在显著局限。为应对这些挑战，我们提出ScaleEnv框架，该框架能够从零开始构建完全交互式的环境与可验证任务。具体而言，ScaleEnv通过程序化测试确保环境可靠性，并借助工具依赖图扩展与可执行动作验证来保证任务完整性与可解性。通过让智能体在ScaleEnv中进行探索式学习，我们在τ^2-Bench和VitaBench等未见过的多轮工具使用基准测试中实现了显著性能提升，展现出强大的泛化能力。此外，我们探究了领域数量增加与模型泛化性能之间的关系，通过实证证明扩展环境多样性对于实现稳健的智能体学习至关重要。

English

Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and existing synthesis methods suffer from significant limitations regarding environmental diversity and scalability. To address these challenges, we introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. Specifically, ScaleEnv ensures environment reliability through procedural testing, and guarantees task completeness and solvability via tool dependency graph expansion and executable action verification. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as τ^2-Bench and VitaBench, highlighting strong generalization capabilities. Furthermore, we investigate the relationship between increasing number of domains and model generalization performance, providing empirical evidence that scaling environmental diversity is critical for robust agent learning.

ScaleEnv：面向通用交互式工具使用智能体训练的環境合成擴展框架

ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training

摘要

Support