AutoEnv:跨環境智能體學習的自動化環境測量平台
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
November 24, 2025
作者: Jiayi Zhang, Yiran Peng, Fanqi Kong, Yang Cheng, Yifan Wu, Zhaoyang Yu, Jinyu Xiang, Jianhao Ruan, Jinlin Wang, Maojia Song, HongZhang Liu, Xiangru Tang, Bang Liu, Chenglin Wu, Yuyu Luo
cs.AI
摘要
人類能透過學習不同世界中的潛在規則,自然適應具有差異化動力機制、觀測模式與獎勵結構的多樣環境。相比之下,現有智能體通常通過在單一領域內自我演化來實現改進,這種方式隱含地假設了固定的環境分佈。跨環境學習能力至今仍缺乏系統性衡量標準:既沒有可控異質環境的標準集合,也缺乏統一表徵智能體學習過程的方法。我們通過兩個步驟解決這些不足:首先提出AutoEnv自動化框架,將環境視為可因子化的狀態轉移、觀測與獎勵分佈,實現低成本(平均4.12美元)生成異質世界。基於AutoEnv構建的AutoEnv-36數據集包含36個環境共358個驗證關卡,七個語言模型在該數據集上僅獲得12-49%的標準化獎勵,證明了AutoEnv-36的挑戰性。其次,我們將智能體學習形式化為以組件為核心的過程,包含選擇、優化、評估三個階段作用於可改進的智能體組件。據此設計八種學習方法並在AutoEnv-36上評估,實證顯示單一學習方法的增益隨環境數量增加迅速衰減,表明固定學習策略難以適應異質環境擴展。雖然環境自適應的學習方法選擇能顯著提升性能,但隨策略空間擴展會出現收益遞減現象。這些結果既揭示了實現可擴展跨環境泛化的必要性,也凸顯了當前智能體學習的侷限性,從而確立AutoEnv與AutoEnv-36作為研究跨環境智能體學習的基準平臺。程式碼已開源於https://github.com/FoundationAgents/AutoEnv。
English
Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving within a single domain, implicitly assuming a fixed environment distribution. Cross-environment learning has remained largely unmeasured: there is no standard collection of controllable, heterogeneous environments, nor a unified way to represent how agents learn. We address these gaps in two steps. First, we propose AutoEnv, an automated framework that treats environments as factorizable distributions over transitions, observations, and rewards, enabling low-cost (4.12 USD on average) generation of heterogeneous worlds. Using AutoEnv, we construct AutoEnv-36, a dataset of 36 environments with 358 validated levels, on which seven language models achieve 12-49% normalized reward, demonstrating the challenge of AutoEnv-36. Second, we formalize agent learning as a component-centric process driven by three stages of Selection, Optimization, and Evaluation applied to an improvable agent component. Using this formulation, we design eight learning methods and evaluate them on AutoEnv-36. Empirically, the gain of any single learning method quickly decrease as the number of environments increases, revealing that fixed learning methods do not scale across heterogeneous environments. Environment-adaptive selection of learning methods substantially improves performance but exhibits diminishing returns as the method space expands. These results highlight both the necessity and the current limitations of agent learning for scalable cross-environment generalization, and position AutoEnv and AutoEnv-36 as a testbed for studying cross-environment agent learning. The code is avaiable at https://github.com/FoundationAgents/AutoEnv.