ReconX:基于视频扩散模型从稀疏视角重建任意场景
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
August 29, 2024
作者: Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan
cs.AI
摘要
三维场景重建技术的进步已将现实世界的二维图像转化为三维模型,仅需数百张输入照片即可生成逼真的三维效果。尽管在密集视角重建场景中已取得显著成功,但从不足的拍摄视角渲染精细场景仍是一个不适定的优化问题,往往导致未观测区域出现伪影和畸变。本文提出ReconX这一新型三维场景重建范式,将模糊的重建挑战重新定义为时序生成任务。其核心思想在于释放大型预训练视频扩散模型的强大生成先验,以解决稀疏视角重建问题。然而,预训练模型直接生成的视频帧难以准确保持三维视角一致性。为此,在有限输入视角条件下,ReconX首先构建全局点云并将其编码至上下文空间作为三维结构条件。在该条件引导下,视频扩散模型合成的视频帧既能保留细节特征,又具备高度三维一致性,确保场景在不同视角下的连贯性。最后,我们通过置信度感知的三维高斯溅射优化方案从生成视频中恢复三维场景。在多个真实场景数据集上的大量实验表明,ReconX在重建质量和泛化能力方面均优于当前最先进方法。
English
Advancements in 3D scene reconstruction have transformed 2D images from the
real world into 3D models, producing realistic 3D results from hundreds of
input photos. Despite great success in dense-view reconstruction scenarios,
rendering a detailed scene from insufficient captured views is still an
ill-posed optimization problem, often resulting in artifacts and distortions in
unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction
paradigm that reframes the ambiguous reconstruction challenge as a temporal
generation task. The key insight is to unleash the strong generative prior of
large pre-trained video diffusion models for sparse-view reconstruction.
However, 3D view consistency struggles to be accurately preserved in directly
generated video frames from pre-trained models. To address this, given limited
input views, the proposed ReconX first constructs a global point cloud and
encodes it into a contextual space as the 3D structure condition. Guided by the
condition, the video diffusion model then synthesizes video frames that are
both detail-preserved and exhibit a high degree of 3D consistency, ensuring the
coherence of the scene from various perspectives. Finally, we recover the 3D
scene from the generated video through a confidence-aware 3D Gaussian Splatting
optimization scheme. Extensive experiments on various real-world datasets show
the superiority of our ReconX over state-of-the-art methods in terms of quality
and generalizability.