ChatPaper.aiChatPaper

智能体世界模型:面向智能体强化学习的无限合成环境

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

February 10, 2026
作者: Zhaoyang Wang, Canwen Xu, Boyi Liu, Yite Wang, Siwei Han, Zhewei Yao, Huaxiu Yao, Yuxiong He
cs.AI

摘要

近期大语言模型(LLM)的突破性进展使得自主智能体能够执行需要与工具及环境进行多轮交互的复杂任务。然而,由于缺乏多样化且可靠的环境资源,此类智能体训练的规模化发展受到限制。本文提出智能体世界模型(AWM)——一种全合成环境生成流程。通过该流程,我们成功构建了覆盖日常场景的1000个交互环境,智能体可在其中使用丰富工具集(平均每个环境含35种工具)并获取高质量观测值。值得注意的是,这些环境由代码驱动且以数据库为支撑,相比LLM模拟的环境能提供更可靠、一致的状态转换。此外,与从现实环境采集轨迹相比,该方案能实现更高效的智能体交互。为验证该资源的有效性,我们针对多轮工具使用型智能体开展了大规模强化学习实验。得益于完全可执行的环境和可访问的数据库状态,我们还能设计出可靠的奖励函数。在三个基准测试上的实验表明,仅使用合成环境(而非特定基准环境)进行训练即可实现强大的分布外泛化能力。代码已开源:https://github.com/Snowflake-Labs/agent-world-model。
English
Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets (35 tools per environment on average) and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of this resource, we perform large-scale reinforcement learning for multi-turn tool-use agents. Thanks to the fully executable environments and accessible database states, we can also design reliable reward functions. Experiments on three benchmarks show that training exclusively in synthetic environments, rather than benchmark-specific ones, yields strong out-of-distribution generalization. The code is available at https://github.com/Snowflake-Labs/agent-world-model.
PDF391February 12, 2026