運動形狀:從單一影片進行的4D重建
Shape of Motion: 4D Reconstruction from a Single Video
July 18, 2024
作者: Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, Angjoo Kanazawa
cs.AI
摘要
單眼動態重建是一個具有挑戰性且歷史悠久的視覺問題,這是因為任務的高度不良定義性。現有方法存在局限性,要麼依賴模板,只在幾乎靜態的場景中有效,要麼未能明確建模三維運動。在這項工作中,我們介紹了一種能夠從隨意拍攝的單眼視頻中重建通用動態場景的方法,該方法具有明確的、完整序列長的三維運動。我們通過兩個關鍵見解應對問題的不完全約束性:首先,我們利用三維運動的低維結構,通過用一組緊湊的SE3運動基表示場景運動。每個點的運動被表示為這些基的線性組合,有助於將場景軟性分解為多個剛性移動組。其次,我們利用包括單眼深度地圖和長程2D軌跡在內的全面數據驅動先驗,並設計了一種方法來有效整合這些噪聲監督信號,從而產生動態場景的全局一致表示。實驗表明,我們的方法在長程3D/2D運動估計和動態場景的新視角合成方面實現了最先進的性能。項目頁面:https://shape-of-motion.github.io/
English
Monocular dynamic reconstruction is a challenging and long-standing vision
problem due to the highly ill-posed nature of the task. Existing approaches are
limited in that they either depend on templates, are effective only in
quasi-static scenes, or fail to model 3D motion explicitly. In this work, we
introduce a method capable of reconstructing generic dynamic scenes, featuring
explicit, full-sequence-long 3D motion, from casually captured monocular
videos. We tackle the under-constrained nature of the problem with two key
insights: First, we exploit the low-dimensional structure of 3D motion by
representing scene motion with a compact set of SE3 motion bases. Each point's
motion is expressed as a linear combination of these bases, facilitating soft
decomposition of the scene into multiple rigidly-moving groups. Second, we
utilize a comprehensive set of data-driven priors, including monocular depth
maps and long-range 2D tracks, and devise a method to effectively consolidate
these noisy supervisory signals, resulting in a globally consistent
representation of the dynamic scene. Experiments show that our method achieves
state-of-the-art performance for both long-range 3D/2D motion estimation and
novel view synthesis on dynamic scenes. Project Page:
https://shape-of-motion.github.io/Summary
AI-Generated Summary