MVDD：多視角深度擴散模型

摘要

去噪擴散模型在2D圖像生成方面展現出優異成果，然而在3D形狀生成方面復制其成功仍然是一個挑戰。本文提出了利用多視角深度，該深度表示複雜的3D形狀以2D數據格式呈現，易於去噪。我們將這種表示與一個名為MVDD的擴散模型配對，該模型能夠生成具有20K+點且具有細緻細節的高質量密集點雲。為了在多視角深度中實現3D一致性，我們引入了一個對應到其相鄰視圖的視線線段注意力，以條件化視圖的去噪步驟。此外，還將深度融合模塊納入擴散步驟中，進一步確保深度圖的對齊。當與表面重建相結合時，MVDD還可以生成高質量的3D網格。此外，MVDD在深度完成等其他任務中表現突出，並且可以作為3D先驗，顯著提升許多下游任務，如GAN反演。通過大量實驗展示的最新成果顯示MVDD在3D形狀生成、深度完成方面具有出色的能力，以及作為下游任務的3D先驗的潛力。

English

Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality dense point clouds with 20K+ points with fine-grained details. To enforce 3D consistency in multi-view depth, we introduce an epipolar line segment attention that conditions the denoising step for a view on its neighboring views. Additionally, a depth fusion module is incorporated into diffusion steps to further ensure the alignment of depth maps. When augmented with surface reconstruction, MVDD can also produce high-quality 3D meshes. Furthermore, MVDD stands out in other tasks such as depth completion, and can serve as a 3D prior, significantly boosting many downstream tasks, such as GAN inversion. State-of-the-art results from extensive experiments demonstrate MVDD's excellent ability in 3D shape generation, depth completion, and its potential as a 3D prior for downstream tasks.

MVDD：多視角深度擴散模型

MVDD: Multi-View Depth Diffusion Models

摘要

Support