ChatPaper.aiChatPaper

AutoEnv:面向跨环境智能体学习的自动化环境构建平台

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

November 24, 2025
作者: Jiayi Zhang, Yiran Peng, Fanqi Kong, Yang Cheng, Yifan Wu, Zhaoyang Yu, Jinyu Xiang, Jianhao Ruan, Jinlin Wang, Maojia Song, HongZhang Liu, Xiangru Tang, Bang Liu, Chenglin Wu, Yuyu Luo
cs.AI

摘要

人类能够通过在不同动态、观测和奖励结构的世界中学习底层规则,自然适应多样化环境。相比之下,现有智能体通常通过在单一领域内自我进化实现改进,这种模式隐含着环境分布固定的前提假设。跨环境学习能力至今缺乏系统评估:既没有可控异构环境的标准集合,也缺乏统一表征智能体学习过程的方法。我们通过两个步骤填补这些空白:首先提出AutoEnv自动化框架,将环境建模为可分解的转移函数、观测空间和奖励结构的概率分布,实现低成本(平均4.12美元)生成异构世界。基于该框架,我们构建了包含36个环境、358个验证场景的AutoEnv-36数据集,七个语言模型在该数据集上仅获得12-49%的标准化奖励,证明了其挑战性。其次,我们将智能体学习形式化为以组件为核心的进程,包含选择、优化、评估三个阶段作用于可改进的智能体组件。基于此形式化框架,我们设计了八种学习方法并在AutoEnv-36上进行评估。实验表明,随着环境数量增加,任何单一学习方法的收益都会快速衰减,揭示固定学习方法无法适应异构环境扩展。虽然环境自适应学习方法选择能显著提升性能,但随着方法空间扩展会出现收益递减。这些结果既凸显了智能体学习对可扩展跨环境泛化的必要性,也揭示了当前方法的局限性,从而将AutoEnv与AutoEnv-36确立为研究跨环境智能体学习的测试平台。代码已开源:https://github.com/FoundationAgents/AutoEnv。
English
Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving within a single domain, implicitly assuming a fixed environment distribution. Cross-environment learning has remained largely unmeasured: there is no standard collection of controllable, heterogeneous environments, nor a unified way to represent how agents learn. We address these gaps in two steps. First, we propose AutoEnv, an automated framework that treats environments as factorizable distributions over transitions, observations, and rewards, enabling low-cost (4.12 USD on average) generation of heterogeneous worlds. Using AutoEnv, we construct AutoEnv-36, a dataset of 36 environments with 358 validated levels, on which seven language models achieve 12-49% normalized reward, demonstrating the challenge of AutoEnv-36. Second, we formalize agent learning as a component-centric process driven by three stages of Selection, Optimization, and Evaluation applied to an improvable agent component. Using this formulation, we design eight learning methods and evaluate them on AutoEnv-36. Empirically, the gain of any single learning method quickly decrease as the number of environments increases, revealing that fixed learning methods do not scale across heterogeneous environments. Environment-adaptive selection of learning methods substantially improves performance but exhibits diminishing returns as the method space expands. These results highlight both the necessity and the current limitations of agent learning for scalable cross-environment generalization, and position AutoEnv and AutoEnv-36 as a testbed for studying cross-environment agent learning. The code is avaiable at https://github.com/FoundationAgents/AutoEnv.
PDF913February 7, 2026