SV4D:使用多帧和多视角一致性进行动态3D内容生成
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
July 24, 2024
作者: Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, Varun Jampani
cs.AI
摘要
我们提出了稳定视频4D(SV4D),这是一个用于多帧和多视角一致的动态3D内容生成的潜在视频扩散模型。与先前依赖于分别训练的视频生成模型和新视角合成的方法不同,我们设计了一个统一的扩散模型,用于生成动态3D对象的新视角视频。具体而言,给定一个单眼参考视频,SV4D为每个视频帧生成在时间上一致的新视角。然后,我们使用生成的新视角视频来高效优化一个隐式的4D表示(动态NeRF),而无需使用大多数先前作品中使用的繁琐的基于SDS的优化。为了训练我们的统一新视角视频生成模型,我们从现有的Objaverse数据集中整理了一个动态3D对象数据集。对多个数据集和用户研究的广泛实验结果表明,与先前作品相比,SV4D在新视角视频合成和4D生成方面表现出卓越的性能。
English
We present Stable Video 4D (SV4D), a latent video diffusion model for
multi-frame and multi-view consistent dynamic 3D content generation. Unlike
previous methods that rely on separately trained generative models for video
generation and novel view synthesis, we design a unified diffusion model to
generate novel view videos of dynamic 3D objects. Specifically, given a
monocular reference video, SV4D generates novel views for each video frame that
are temporally consistent. We then use the generated novel view videos to
optimize an implicit 4D representation (dynamic NeRF) efficiently, without the
need for cumbersome SDS-based optimization used in most prior works. To train
our unified novel view video generation model, we curated a dynamic 3D object
dataset from the existing Objaverse dataset. Extensive experimental results on
multiple datasets and user studies demonstrate SV4D's state-of-the-art
performance on novel-view video synthesis as well as 4D generation compared to
prior works.Summary
AI-Generated Summary