GenEnv:大型语言模型代理与环境模拟器的难度对齐协同进化
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
December 22, 2025
作者: Jiacheng Guo, Ling Yang, Peter Chen, Qixin Xiao, Yinjie Wang, Xinzhe Juan, Jiahao Qiu, Ke Shen, Mengdi Wang
cs.AI
摘要
训练能力强的大型语言模型(LLM)智能体正面临关键瓶颈:真实世界交互数据成本高昂且具有静态局限性。我们通过提出GenEnv框架解决这一问题,该框架在智能体与可扩展的生成式环境模拟器之间建立了难度对齐的协同进化博弈。与传统基于静态数据集进化模型的方法不同,GenEnv实现了数据动态演化:模拟器作为动态课程策略,持续生成精准匹配智能体"最近发展区"的任务。这一过程由简单有效的α-课程奖励机制引导,使任务难度与智能体当前能力保持同步。我们在API-Bank、ALFWorld、BFCL、Bamboogle和TravelPlanner五个基准测试中评估GenEnv。在所有任务中,GenEnv将7B基线模型的智能体性能提升最高达40.3%,达到或超越了更大模型的平均表现。与基于Gemini 2.5 Pro的离线数据增强方法相比,GenEnv在使用数据量减少3.3倍的情况下实现了更优性能。通过从静态监督转向自适应模拟,GenEnv为扩展智能体能力提供了一条数据高效路径。
English
Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective α-Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to +40.3\% over 7B baselines and matches or exceeds the average performance of larger models. Compared to Gemini 2.5 Pro-based offline data augmentation, GenEnv achieves better performance while using 3.3times less data. By shifting from static supervision to adaptive simulation, GenEnv provides a data-efficient pathway for scaling agent capabilities.