ライブワールド：生成映像ワールドモデルにおける視野外ダイナミクスのシミュレーション

要旨

近年の生成的ビデオ世界モデルは、視覚環境の進化をシミュレートし、観察者がカメラ制御を通じてインタラクティブにシーンを探索できることを目指している。しかし、これらのモデルは暗黙的に、世界の進化が観察者の視野内でのみ生じることを前提としている。一度オブジェクトが観察者の視界から外れると、その状態は記憶内で「凍結」され、後で同じ領域を再訪しても、その間に発生しているはずの事象が反映されないことが多い。本研究では、この見過ごされていた限界を「視野外ダイナミクス」問題として特定し、形式化する。この問題は、ビデオ世界モデルが継続的に進化する世界を表現することを妨げている。この問題に対処するため、我々はビデオ世界モデルを拡張し、永続的な世界進化をサポートする新しいフレームワークであるLiveWorldを提案する。LiveWorldは、世界を静的な観測記憶として扱うのではなく、静的な3D背景と、観測されていない間も進化し続ける動的エンティティから構成される永続的なグローバル状態をモデル化する。これらの見えないダイナミクスを維持するために、LiveWorldはモニターベースのメカニズムを導入する。これは能動的なエンティティの時間的進行を自律的にシミュレートし、再訪時に進化した状態を同期することで、空間的に一貫したレンダリングを保証する。評価のために、我々は視野外ダイナミクス維持タスク専用のベンチマークであるLiveBenchをさらに導入する。大規模な実験により、LiveWorldが永続的なイベント進化と長期的なシーン一貫性を実現し、既存の2D観測ベースの記憶と真の4D動的世界シミュレーションの間のギャップを埋めることが示された。ベースラインとベンチマークはhttps://zichengduan.github.io/LiveWorld/index.html で公開予定である。

English

Recent generative video world models aim to simulate visual environment evolution, allowing an observer to interactively explore the scene via camera control. However, they implicitly assume that the world only evolves within the observer's field of view. Once an object leaves the observer's view, its state is "frozen" in memory, and revisiting the same region later often fails to reflect events that should have occurred in the meantime. In this work, we identify and formalize this overlooked limitation as the "out-of-sight dynamics" problem, which impedes video world models from representing a continuously evolving world. To address this issue, we propose LiveWorld, a novel framework that extends video world models to support persistent world evolution. Instead of treating the world as static observational memory, LiveWorld models a persistent global state composed of a static 3D background and dynamic entities that continue evolving even when unobserved. To maintain these unseen dynamics, LiveWorld introduces a monitor-based mechanism that autonomously simulates the temporal progression of active entities and synchronizes their evolved states upon revisiting, ensuring spatially coherent rendering. For evaluation, we further introduce LiveBench, a dedicated benchmark for the task of maintaining out-of-sight dynamics. Extensive experiments show that LiveWorld enables persistent event evolution and long-term scene consistency, bridging the gap between existing 2D observation-based memory and true 4D dynamic world simulation. The baseline and benchmark will be publicly available at https://zichengduan.github.io/LiveWorld/index.html.

ライブワールド：生成映像ワールドモデルにおける視野外ダイナミクスのシミュレーション

LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models

要旨

Support