MVDD：多视角深度扩散模型

摘要

去噪扩散模型在二维图像生成方面展现出了出色的结果，然而在三维形状生成方面复制其成功仍然是一个挑战。本文提出利用多视角深度，将复杂的三维形状表示为易于去噪的二维数据格式。我们将这种表示与一种名为MVDD的扩散模型相结合，该模型能够生成具有精细细节的20K+点的高质量密集点云。为了在多视角深度中强化三维一致性，我们引入了一个对视图的去噪步骤进行条件化的极线段注意力，使其考虑相邻视图。此外，还在扩散步骤中加入了深度融合模块，进一步确保深度图的对齐。当结合表面重建时，MVDD还能够生成高质量的三维网格。此外，MVDD在深度完成等其他任务中表现突出，并可作为三维先验，显著提升许多下游任务，如GAN反演。通过大量实验得出的最新结果表明，MVDD在三维形状生成、深度完成方面具有出色的能力，以及作为下游任务的三维先验的潜力。

English

Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality dense point clouds with 20K+ points with fine-grained details. To enforce 3D consistency in multi-view depth, we introduce an epipolar line segment attention that conditions the denoising step for a view on its neighboring views. Additionally, a depth fusion module is incorporated into diffusion steps to further ensure the alignment of depth maps. When augmented with surface reconstruction, MVDD can also produce high-quality 3D meshes. Furthermore, MVDD stands out in other tasks such as depth completion, and can serve as a 3D prior, significantly boosting many downstream tasks, such as GAN inversion. State-of-the-art results from extensive experiments demonstrate MVDD's excellent ability in 3D shape generation, depth completion, and its potential as a 3D prior for downstream tasks.

MVDD：多视角深度扩散模型

MVDD: Multi-View Depth Diffusion Models

摘要

Support