运动的形状:从单个视频进行的4D重建
Shape of Motion: 4D Reconstruction from a Single Video
July 18, 2024
作者: Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, Angjoo Kanazawa
cs.AI
摘要
单目动态重建是一个具有挑战性且长期存在的视觉问题,这是由于任务的高度不适定性。现有方法存在局限性,要么依赖模板,只在准静态场景中有效,要么未能明确建模三维运动。在这项工作中,我们介绍了一种能够从随意拍摄的单目视频中重建通用动态场景的方法,具有显式的、完整序列长度的三维运动。我们通过两个关键见解来解决问题的不完全约束性:首先,我们利用三维运动的低维结构,通过用紧凑的SE3运动基组表示场景运动。每个点的运动被表达为这些基组的线性组合,有助于将场景软分解为多个刚性移动组。其次,我们利用包括单目深度图和长距离2D轨迹在内的全面数据驱动的先验,并设计了一种方法来有效整合这些嘈杂的监督信号,从而得到动态场景的全局一致表示。实验证明,我们的方法在长距离3D/2D运动估计和动态场景的新视角合成方面实现了最先进的性能。项目页面:https://shape-of-motion.github.io/
English
Monocular dynamic reconstruction is a challenging and long-standing vision
problem due to the highly ill-posed nature of the task. Existing approaches are
limited in that they either depend on templates, are effective only in
quasi-static scenes, or fail to model 3D motion explicitly. In this work, we
introduce a method capable of reconstructing generic dynamic scenes, featuring
explicit, full-sequence-long 3D motion, from casually captured monocular
videos. We tackle the under-constrained nature of the problem with two key
insights: First, we exploit the low-dimensional structure of 3D motion by
representing scene motion with a compact set of SE3 motion bases. Each point's
motion is expressed as a linear combination of these bases, facilitating soft
decomposition of the scene into multiple rigidly-moving groups. Second, we
utilize a comprehensive set of data-driven priors, including monocular depth
maps and long-range 2D tracks, and devise a method to effectively consolidate
these noisy supervisory signals, resulting in a globally consistent
representation of the dynamic scene. Experiments show that our method achieves
state-of-the-art performance for both long-range 3D/2D motion estimation and
novel view synthesis on dynamic scenes. Project Page:
https://shape-of-motion.github.io/Summary
AI-Generated Summary