代理世界建模:基礎、能力、法則與未來展望
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
April 24, 2026
作者: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia
cs.AI
摘要
隨著人工智慧系統從生成文字轉向透過持續互動實現目標,環境動態建模能力已成為核心瓶頸。無論是操控物件、導航軟體、協同合作或設計實驗的智慧體,皆需具備預測性環境模型,然而「世界模型」一詞在不同研究社群中存在歧義。我們提出「層級×法則」的雙軸分類架構:第一軸定義三層能力——L1預測器(學習單步局部轉移運算元)、L2模擬器(組合為符合領域法則的多步行動條件推演)、L3演化器(當預測與新證據衝突時自主修正模型);第二軸劃分四種法則體系:物理、數位、社會與科學法則,這些體系決定了世界模型需滿足的約束條件及其易失效場景。基於此框架,我們綜整400餘篇研究,歸納逾百個代表性系統,涵蓋基於模型的強化學習、影片生成、網頁與圖形介面代理、多智慧體社會模擬及AI驅動的科學發現。透過分析各層級-法則組合的方法論、失效模式與評估實踐,我們提出以決策為核心的評估原則與最小可複現評估套件,並闡述架構指引、開放難題與治理挑戰。此路線圖串聯了過往孤立的社群,為從被動的下一步預測走向能模擬——最終重塑智慧體運作環境——的世界模型規劃出發展路徑。
English
As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We introduce a "levels x laws" taxonomy organized along two axes. The first defines three capability levels: L1 Predictor, which learns one-step local transition operators; L2 Simulator, which composes them into multi-step, action-conditioned rollouts that respect domain laws; and L3 Evolver, which autonomously revises its own model when predictions fail against new evidence. The second identifies four governing-law regimes: physical, digital, social, and scientific. These regimes determine what constraints a world model must satisfy and where it is most likely to fail. Using this framework, we synthesize over 400 works and summarize more than 100 representative systems spanning model-based reinforcement learning, video generation, web and GUI agents, multi-agent social simulation, and AI-driven scientific discovery. We analyze methods, failure modes, and evaluation practices across level-regime pairs, propose decision-centric evaluation principles and a minimal reproducible evaluation package, and outline architectural guidance, open problems, and governance challenges. The resulting roadmap connects previously isolated communities and charts a path from passive next-step prediction toward world models that can simulate, and ultimately reshape, the environments in which agents operate.