MVDD:多视角深度扩散模型
MVDD: Multi-View Depth Diffusion Models
December 8, 2023
作者: Zhen Wang, Qiangeng Xu, Feitong Tan, Menglei Chai, Shichen Liu, Rohit Pandey, Sean Fanello, Achuta Kadambi, Yinda Zhang
cs.AI
摘要
去噪扩散模型在二维图像生成方面展现出了出色的结果,然而在三维形状生成方面复制其成功仍然是一个挑战。本文提出利用多视角深度,将复杂的三维形状表示为易于去噪的二维数据格式。我们将这种表示与一种名为MVDD的扩散模型相结合,该模型能够生成具有精细细节的20K+点的高质量密集点云。为了在多视角深度中强化三维一致性,我们引入了一个对视图的去噪步骤进行条件化的极线段注意力,使其考虑相邻视图。此外,还在扩散步骤中加入了深度融合模块,进一步确保深度图的对齐。当结合表面重建时,MVDD还能够生成高质量的三维网格。此外,MVDD在深度完成等其他任务中表现突出,并可作为三维先验,显著提升许多下游任务,如GAN反演。通过大量实验得出的最新结果表明,MVDD在三维形状生成、深度完成方面具有出色的能力,以及作为下游任务的三维先验的潜力。
English
Denoising diffusion models have demonstrated outstanding results in 2D image
generation, yet it remains a challenge to replicate its success in 3D shape
generation. In this paper, we propose leveraging multi-view depth, which
represents complex 3D shapes in a 2D data format that is easy to denoise. We
pair this representation with a diffusion model, MVDD, that is capable of
generating high-quality dense point clouds with 20K+ points with fine-grained
details. To enforce 3D consistency in multi-view depth, we introduce an
epipolar line segment attention that conditions the denoising step for a view
on its neighboring views. Additionally, a depth fusion module is incorporated
into diffusion steps to further ensure the alignment of depth maps. When
augmented with surface reconstruction, MVDD can also produce high-quality 3D
meshes. Furthermore, MVDD stands out in other tasks such as depth completion,
and can serve as a 3D prior, significantly boosting many downstream tasks, such
as GAN inversion. State-of-the-art results from extensive experiments
demonstrate MVDD's excellent ability in 3D shape generation, depth completion,
and its potential as a 3D prior for downstream tasks.