Qwen-智能体世界：通用智能体的语言世界模型

摘要

世界模型根据当前观测和动作预测环境动态，是智能体进行推理与规划的核心认知机制。本研究探讨了基于语言模型的世界建模如何进一步拓展通用智能体的边界。(i) 我们首先聚焦于构建面向智能体环境模拟的基础模型。推出了Qwen-AgentWorld-35B-A3B和Qwen-AgentWorld-397B-A17B——这是首批能够通过长链条思维推理模拟覆盖7个领域的智能体环境的语言世界模型。利用真实环境中7个领域的超过1000万条交互轨迹，我们通过三阶段训练流程开发了Qwen-AgentWorld：连续预训练阶段从状态转移动态和扩充的专业语料库中注入通用世界建模能力；监督微调阶段激活下一状态预测推理；强化学习阶段通过专为混合评价标准与规则奖励设计的框架提升模拟保真度。为评估语言世界模型，我们提出了AgentWorldBench——一个基于5个前沿模型在9个成熟基准上的真实交互构建的综合基准。实验结果表明，Qwen-AgentWorld显著优于现有前沿模型。(ii) 除基础模型外，我们进一步探索了世界模型赋能通用智能体的两种互补范式。首先，作为解耦的环境模拟器，Qwen-AgentWorld支持对数千个真实环境进行可扩展、可控的模拟，用于智能体强化学习，其收益超越了仅依赖真实环境训练的效果。其次，作为统一的智能体基础模型，世界模型训练作为一种高效的预热方法，能够提升7个智能体基准的下游性能。代码：https://github.com/QwenLM/Qwen-AgentWorld

English

A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation models for agentic environment simulation. We introduce Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B, the first language world models capable of simulating agentic environments covering 7 domains via long chain-of-thought reasoning. Leveraging more than 10M environment interaction trajectories of 7 domains in real-world environments, we develop Qwen-AgentWorld through a three-stage training pipeline: CPT injects general-purpose world modeling capabilities from the state transition dynamics and augmented professional corpora, SFT activates next-state-prediction reasoning, and RL sharpens simulation fidelity through a tailored framework with hybrid rubric-and-rule rewards. To evaluate language world models, we present AgentWorldBench, a comprehensive benchmark constructed from real-world interactions of 5 frontier models on 9 established benchmarks. Empirical results demonstrate that Qwen-AgentWorld significantly outperforms existing frontier models. (ii) Beyond foundation models, we further investigate two complementary paradigms through which world modeling enhances general agents. First, as a decoupled environment simulator, Qwen-AgentWorld supports scalable and controllable simulation of thousands of real-world environments for agentic RL, yielding gains that surpass real-environment training alone. Second, as a unified agent foundation model, world-model training acts as a highly effective warm-up that improves downstream performance across 7 agentic benchmarks. Code: https://github.com/QwenLM/Qwen-AgentWorld