单目视频的动态视图合成中的扩散先验
Diffusion Priors for Dynamic View Synthesis from Monocular Videos
January 10, 2024
作者: Chaoyang Wang, Peiye Zhuang, Aliaksandr Siarohin, Junli Cao, Guocheng Qian, Hsin-Ying Lee, Sergey Tulyakov
cs.AI
摘要
动态新视图合成旨在捕捉视频中视觉内容的时间演变。现有方法在区分运动和结构方面存在困难,特别是在相机姿势相对于物体运动未知或受限的情况下。此外,仅凭借参考图像的信息,极具挑战性地去幻想在给定视频中被遮挡或部分观察到的未见区域。为了解决这些问题,我们首先使用定制技术在视频帧上微调预训练的RGB-D扩散模型。随后,我们将从微调模型中提炼知识,形成包含动态和静态神经辐射场(NeRF)组件的4D表示。所提出的流程在保持场景身份的同时实现几何一致性。我们进行了彻底的实验,定性和定量评估了所提方法的有效性。我们的结果展示了我们的方法在挑战性案例中的稳健性和实用性,进一步推动了动态新视图合成的发展。
English
Dynamic novel view synthesis aims to capture the temporal evolution of visual
content within videos. Existing methods struggle to distinguishing between
motion and structure, particularly in scenarios where camera poses are either
unknown or constrained compared to object motion. Furthermore, with information
solely from reference images, it is extremely challenging to hallucinate unseen
regions that are occluded or partially observed in the given videos. To address
these issues, we first finetune a pretrained RGB-D diffusion model on the video
frames using a customization technique. Subsequently, we distill the knowledge
from the finetuned model to a 4D representations encompassing both dynamic and
static Neural Radiance Fields (NeRF) components. The proposed pipeline achieves
geometric consistency while preserving the scene identity. We perform thorough
experiments to evaluate the efficacy of the proposed method qualitatively and
quantitatively. Our results demonstrate the robustness and utility of our
approach in challenging cases, further advancing dynamic novel view synthesis.