MoVieS：一秒实现运动感知的四维动态视角合成

摘要

我们提出了MoVieS，一种新颖的前馈模型，能够在一秒内从单目视频中合成4D动态新视角。MoVieS采用像素对齐的高斯基元网格来表示动态3D场景，并显式监督其随时间变化的运动。这首次实现了外观、几何与运动的统一建模，并在单一学习框架内支持视角合成、重建及3D点追踪。通过将新视角合成与动态几何重建相结合，MoVieS能够在多样数据集上进行大规模训练，且对任务特定监督的依赖降至最低。因此，它自然支持多种零样本应用，如场景流估计和移动物体分割。大量实验验证了MoVieS在多项任务中的有效性和效率，不仅取得了竞争性的性能，还实现了数量级的速度提升。

English

We present MoVieS, a novel feed-forward model that synthesizes 4D dynamic novel views from monocular videos in one second. MoVieS represents dynamic 3D scenes using pixel-aligned grids of Gaussian primitives, explicitly supervising their time-varying motion. This allows, for the first time, the unified modeling of appearance, geometry and motion, and enables view synthesis, reconstruction and 3D point tracking within a single learning-based framework. By bridging novel view synthesis with dynamic geometry reconstruction, MoVieS enables large-scale training on diverse datasets with minimal dependence on task-specific supervision. As a result, it also naturally supports a wide range of zero-shot applications, such as scene flow estimation and moving object segmentation. Extensive experiments validate the effectiveness and efficiency of MoVieS across multiple tasks, achieving competitive performance while offering several orders of magnitude speedups.

MoVieS：一秒实现运动感知的四维动态视角合成

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

摘要

Support