라이브월드: 생성 비디오 세계 모델에서 시야 밖 역학 시뮬레이션

초록

최근의 생성형 비디오 세계 모델은 시각적 환경의 변화를 시뮬레이션하여 관찰자가 카메라 제어를 통해 장면을 상호작용적으로 탐색할 수 있도록 하는 것을 목표로 합니다. 그러나 이러한 모델은 세계가 관찰자의 시야 내에서만 진화한다고 암묵적으로 가정합니다. 객체가 관찰자의 시야를 벗어나면 그 상태는 메모리에서 "고정"되며, 나중에 동일한 영역을 다시 방문했을 때 그동안 발생했어야 할 사건들이 제대로 반영되지 않는 경우가 많습니다. 본 연구에서는 이 간과된 한계를 "시야 외 역학" 문제로 규명하고 공식화하며, 이로 인해 비디오 세계 모델이 지속적으로 진화하는 세계를 표현하는 데 어려움을 겪는다고 지적합니다. 이 문제를 해결하기 위해 우리는 비디오 세계 모델을 확장하여 지속적인 세계 진화를 지원하는 새로운 프레임워크인 LiveWorld를 제안합니다. LiveWorld는 세계를 정적인 관찰 메모리로 취급하는 대신, 정적인 3D 배경과 관찰되지 않을 때도 계속 진화하는 동적 개체들로 구성된 지속적인 전역 상태를 모델링합니다. 이러한 보이지 않는 역학을 유지하기 위해 LiveWorld는 활성 개체들의 시간적 진행을 자율적으로 시뮬레이션하고 재방문 시 진화된 상태를 동기화하여 공간적으로 일관된 렌더링을 보장하는 모니터 기반 메커니즘을 도입합니다. 평가를 위해 우리는 시야 외 역학 유지 작업을 위한 전용 벤치마크인 LiveBench를 추가로 소개합니다. 광범위한 실험을 통해 LiveWorld가 지속적인 사건 진화와 장기적 장면 일관성을 가능하게 하며, 기존의 2D 관찰 기반 메모리와 진정한 4D 동적 세계 시뮬레이션 간의 격차를 해소함을 입증합니다. 베이스라인과 벤치마크는 https://zichengduan.github.io/LiveWorld/index.html에서 공개될 예정입니다.

English

Recent generative video world models aim to simulate visual environment evolution, allowing an observer to interactively explore the scene via camera control. However, they implicitly assume that the world only evolves within the observer's field of view. Once an object leaves the observer's view, its state is "frozen" in memory, and revisiting the same region later often fails to reflect events that should have occurred in the meantime. In this work, we identify and formalize this overlooked limitation as the "out-of-sight dynamics" problem, which impedes video world models from representing a continuously evolving world. To address this issue, we propose LiveWorld, a novel framework that extends video world models to support persistent world evolution. Instead of treating the world as static observational memory, LiveWorld models a persistent global state composed of a static 3D background and dynamic entities that continue evolving even when unobserved. To maintain these unseen dynamics, LiveWorld introduces a monitor-based mechanism that autonomously simulates the temporal progression of active entities and synchronizes their evolved states upon revisiting, ensuring spatially coherent rendering. For evaluation, we further introduce LiveBench, a dedicated benchmark for the task of maintaining out-of-sight dynamics. Extensive experiments show that LiveWorld enables persistent event evolution and long-term scene consistency, bridging the gap between existing 2D observation-based memory and true 4D dynamic world simulation. The baseline and benchmark will be publicly available at https://zichengduan.github.io/LiveWorld/index.html.

라이브월드: 생성 비디오 세계 모델에서 시야 밖 역학 시뮬레이션

LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models

초록

Support