**LiveWorld:生成式视频世界模型中视野外动态的模拟系统**
LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
March 7, 2026
作者: Zicheng Duan, Jiatong Xia, Zeyu Zhang, Wenbo Zhang, Gengze Zhou, Chenhui Gou, Yefei He, Feng Chen, Xinyu Zhang, Lingqiao Liu
cs.AI
摘要
当前生成式视频世界模型旨在模拟视觉环境的动态演化,使观察者能够通过相机控制交互式探索场景。然而这类模型隐含着一个前提:世界演化仅发生在观察者视野范围内。一旦物体离开观察视野,其状态将在记忆中被"冻结",后续重新造访相同区域时往往无法反映期间本应发生的动态事件。本研究首次发现并将这一被忽视的局限形式化为"视野外动态演化"问题,该问题阻碍了视频世界模型对持续演化世界的表征能力。为解决此问题,我们提出LiveWorld创新框架,通过扩展视频世界模型支持持久性世界演化。该框架摒弃将世界视为静态观测记忆的传统思路,转而构建由静态三维背景与动态实体构成的持久化全局状态——这些实体即使未被观察仍会持续演化。为维持不可见区域的动态演化,LiveWorld引入基于监控器的机制,自主模拟活跃实体的时间演进过程,并在重新观察时同步其演化后的状态,确保空间一致性渲染。针对评估需求,我们进一步提出LiveBench专用基准测试集,专门用于评估视野外动态演化的维持能力。大量实验表明,LiveWorld能够实现持续性事件演化与长期场景一致性,弥合了现有基于二维观测的记忆系统与真实四维动态世界模拟之间的鸿沟。基线模型与基准测试集已公开于https://zichengduan.github.io/LiveWorld/index.html。
English
Recent generative video world models aim to simulate visual environment evolution, allowing an observer to interactively explore the scene via camera control. However, they implicitly assume that the world only evolves within the observer's field of view. Once an object leaves the observer's view, its state is "frozen" in memory, and revisiting the same region later often fails to reflect events that should have occurred in the meantime. In this work, we identify and formalize this overlooked limitation as the "out-of-sight dynamics" problem, which impedes video world models from representing a continuously evolving world. To address this issue, we propose LiveWorld, a novel framework that extends video world models to support persistent world evolution. Instead of treating the world as static observational memory, LiveWorld models a persistent global state composed of a static 3D background and dynamic entities that continue evolving even when unobserved. To maintain these unseen dynamics, LiveWorld introduces a monitor-based mechanism that autonomously simulates the temporal progression of active entities and synchronizes their evolved states upon revisiting, ensuring spatially coherent rendering. For evaluation, we further introduce LiveBench, a dedicated benchmark for the task of maintaining out-of-sight dynamics. Extensive experiments show that LiveWorld enables persistent event evolution and long-term scene consistency, bridging the gap between existing 2D observation-based memory and true 4D dynamic world simulation. The baseline and benchmark will be publicly available at https://zichengduan.github.io/LiveWorld/index.html.