ReconX:使用視頻擴散模型從稀疏視圖重建任何場景
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
August 29, 2024
作者: Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan
cs.AI
摘要
3D場景重建的進步已將現實世界的2D影像轉換為3D模型,從數百張輸入照片中產生逼真的3D結果。儘管在密集視角重建方案中取得了巨大成功,從不足的拍攝視角中呈現詳細場景仍然是一個不明確的優化問題,通常導致未知區域出現瑕疵和失真。本文提出了ReconX,一種新穎的3D場景重建範式,將模糊的重建挑戰重新定義為時間生成任務。關鍵見解在於利用大型預訓練視頻擴散模型的強大生成先驗進行稀疏視角重建。然而,直接生成的視頻幀往往無法準確保留3D視角一致性。為了解決這個問題,ReconX首先根據有限的輸入視角構建全局點雲,將其編碼為上下文空間作為3D結構條件。在此條件的引導下,視頻擴散模型合成出既保留細節又具有高度3D一致性的視頻幀,確保了從各種視角觀看時場景的一致性。最後,通過一個自信感知的3D高斯擴散優化方案,從生成的視頻中恢復3D場景。在各種真實世界數據集上進行的大量實驗表明,我們的ReconX在質量和泛化能力方面優於最先進的方法。
English
Advancements in 3D scene reconstruction have transformed 2D images from the
real world into 3D models, producing realistic 3D results from hundreds of
input photos. Despite great success in dense-view reconstruction scenarios,
rendering a detailed scene from insufficient captured views is still an
ill-posed optimization problem, often resulting in artifacts and distortions in
unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction
paradigm that reframes the ambiguous reconstruction challenge as a temporal
generation task. The key insight is to unleash the strong generative prior of
large pre-trained video diffusion models for sparse-view reconstruction.
However, 3D view consistency struggles to be accurately preserved in directly
generated video frames from pre-trained models. To address this, given limited
input views, the proposed ReconX first constructs a global point cloud and
encodes it into a contextual space as the 3D structure condition. Guided by the
condition, the video diffusion model then synthesizes video frames that are
both detail-preserved and exhibit a high degree of 3D consistency, ensuring the
coherence of the scene from various perspectives. Finally, we recover the 3D
scene from the generated video through a confidence-aware 3D Gaussian Splatting
optimization scheme. Extensive experiments on various real-world datasets show
the superiority of our ReconX over state-of-the-art methods in terms of quality
and generalizability.