ChatPaper.aiChatPaper

能动世界建模:基础、能力、法则与未来展望

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

April 24, 2026
作者: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
cs.AI

摘要

随着人工智能系统从生成文本转向通过持续交互实现目标,对环境动态的建模能力成为核心瓶颈。无论是操控物体、导航软件、协同协作还是设计实验的智能体,都需要具备预测性环境模型,然而"世界模型"这一术语在不同研究社群中含义各异。我们提出基于"能力层级×规律体系"的双轴分类法:第一轴定义三个能力层级——L1预测器(学习单步局部转移算子)、L2模拟器(组合成符合领域规律的多步行动条件推演)、L3进化器(当预测与新证据不符时自主修正模型);第二轴界定四种规律体系:物理、数字、社会与科学规律,这些体系决定了世界模型需满足的约束条件及易失效场景。基于该框架,我们系统梳理了400余项研究,总结了涵盖基于模型的强化学习、视频生成、网页/GUI智能体、多智能体社会模拟及AI驱动科学发现等领域的100多个代表性系统。通过分析不同层级-体系配对下的方法特性、失效模式与评估实践,我们提出以决策为核心的评估原则与最小可复现评估套件,并给出架构设计指南、开放性问题及治理挑战。该路线图将此前孤立的研究社群有机连接,规划出从被动单步预测迈向能模拟——最终重塑智能体操作环境——的世界模型发展路径。
English
As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We introduce a "levels x laws" taxonomy organized along two axes. The first defines three capability levels: L1 Predictor, which learns one-step local transition operators; L2 Simulator, which composes them into multi-step, action-conditioned rollouts that respect domain laws; and L3 Evolver, which autonomously revises its own model when predictions fail against new evidence. The second identifies four governing-law regimes: physical, digital, social, and scientific. These regimes determine what constraints a world model must satisfy and where it is most likely to fail. Using this framework, we synthesize over 400 works and summarize more than 100 representative systems spanning model-based reinforcement learning, video generation, web and GUI agents, multi-agent social simulation, and AI-driven scientific discovery. We analyze methods, failure modes, and evaluation practices across level-regime pairs, propose decision-centric evaluation principles and a minimal reproducible evaluation package, and outline architectural guidance, open problems, and governance challenges. The resulting roadmap connects previously isolated communities and charts a path from passive next-step prediction toward world models that can simulate, and ultimately reshape, the environments in which agents operate.
PDF1521April 28, 2026