Spatia：具可更新空間記憶的影片生成技術

摘要

現有的影片生成模型因視訊訊號具有密集高維特性，難以維持長時空的連續一致性。為突破此限制，我們提出 Spatia——一種空間記憶感知的影片生成框架，其核心在於顯式地將三維場景點雲作為持久化空間記憶進行維護。Spatia 基於此空間記憶迭代生成影片片段，並透過視覺 SLAM 技術持續更新記憶庫。這種動靜態解耦的設計不僅強化了生成過程中的空間連貫性，同時保留了模型生成逼真動態實體的能力。此外，Spatia 支援顯式相機控制與三維感知互動編輯等應用，為可擴展的記憶驅動式影片生成提供了幾何基礎的框架。

English

Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.

Spatia：具可更新空間記憶的影片生成技術

Spatia: Video Generation with Updatable Spatial Memory

摘要

Support