ChatPaper.aiChatPaper

GenEnv:LLM智能体与环境模拟器的难度协同进化

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

December 22, 2025
作者: Jiacheng Guo, Ling Yang, Peter Chen, Qixin Xiao, Yinjie Wang, Xinzhe Juan, Jiahao Qiu, Ke Shen, Mengdi Wang
cs.AI

摘要

训练能力强的大型语言模型(LLM)智能体正面临严峻挑战:真实世界交互数据成本高昂且具有静态局限性。为此,我们提出GenEnv框架,通过在智能体与可扩展的生成式环境模拟器之间建立难度对齐的协同进化博弈来解决这一问题。与传统基于静态数据集的模型进化方法不同,GenEnv实现了数据动态演化:模拟器作为动态课程策略,持续生成与智能体"最近发展区"精准匹配的任务。这一过程由简单有效的α-课程奖励机制引导,使任务难度与智能体当前能力保持同步。我们在API-Bank、ALFWorld、BFCL、Bamboogle和TravelPlanner五个基准测试中评估GenEnv,结果显示:相较于70亿参数基线模型,该框架最高可提升智能体性能40.3%,其平均表现达到或超越更大规模模型。与基于Gemini 2.5 Pro的离线数据增强方法相比,GenEnv在减少3.3倍数据用量的情况下实现了更优性能。通过从静态监督转向自适应模拟,GenEnv为扩展智能体能力提供了一条数据高效的新路径。
English
Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective α-Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to +40.3\% over 7B baselines and matches or exceeds the average performance of larger models. Compared to Gemini 2.5 Pro-based offline data augmentation, GenEnv achieves better performance while using 3.3times less data. By shifting from static supervision to adaptive simulation, GenEnv provides a data-efficient pathway for scaling agent capabilities.
PDF122December 24, 2025