世界行动模型:综述
World Action Models: A Survey
June 18, 2026
作者: Qiuhong Shen, Shihua Zhang, Yue Liao, Qi Li, Zhenxiong Tan, Shizun Wang, Shuicheng Yan, Xinchao Wang
cs.AI
摘要
世界行动模型(WAMs)是一类具身预测行动模型,能将未来预测转化为可供行动参考的依据。近期WAMs重新利用了大型视频生成模型,另一条并行研究路线则依赖语言或视觉-语言主干网络,而非以视频生成为核心。这种快速扩展模糊了广义世界模型、视频生成模型、基于行动的视觉世界模型、视觉-语言-行动策略与WAMs之间的界限。本综述为该领域提供了统一的框架。它首先厘清这些边界,进而通过两个互补视角梳理现有工作:第一视角聚焦每种方法需生成什么——涵盖渲染式未来、隐式未来及无视频生成的行动推理;第二视角则按预测基质、主干网络、行动耦合方式及部署模式对每种方法进行分解。这种剖析方式支撑了对可交互性、因果性、持久性、物理合理性及泛化能力的统一讨论,随后探讨数据、评估及开放性挑战。沿这些维度,一个一致的设计模式浮现:WAMs并非简单地在行动头上附加视频生成器,而是预测行动方法——其设计选择通过权衡表征丰富性与计算、内存、延迟及行动标注成本来实现。该领域正朝着一种趋势发展:在保留控制所需信息的前提下,减少对未来的生成内容。本综述主页见 https://world-action-models.github.io/。
English
World Action Models (WAMs) are embodied predictive-action models that make a forecast of the future available to action. Recent WAMs repurpose large video generation models, and a parallel line relies on language or vision-language backbones without a video-generation core. This rapid expansion has blurred the boundary among broad world models, video generation models, action-grounded video world models, Vision-Language-Action policies, and WAMs. This survey gives the field a common account. It first clarifies these boundaries, then organizes existing works through two complementary views. The first view asks what each method is required to generate, spanning rendered futures, latent futures, and video-generation-free action reasoning. The second view decomposes each method by predictive substrate, backbone, action coupling, and deployment regime. This anatomy supports a unified discussion of interactability, causality, persistence, physical plausibility, and generalization, followed by data, evaluation, and open challenges. Across these axes, a consistent design pattern emerges: WAMs are not simply video generators with action heads, but predictive-action methods whose design choices trade representational richness against compute, memory, latency, and action-label cost. The field is moving toward methods that generate less of the future while preserving what control requires. The survey homepage is available at https://world-action-models.github.io/.