MVDD:多視角深度擴散模型
MVDD: Multi-View Depth Diffusion Models
December 8, 2023
作者: Zhen Wang, Qiangeng Xu, Feitong Tan, Menglei Chai, Shichen Liu, Rohit Pandey, Sean Fanello, Achuta Kadambi, Yinda Zhang
cs.AI
摘要
去噪擴散模型在2D圖像生成方面展現出優異成果,然而在3D形狀生成方面復制其成功仍然是一個挑戰。本文提出了利用多視角深度,該深度表示複雜的3D形狀以2D數據格式呈現,易於去噪。我們將這種表示與一個名為MVDD的擴散模型配對,該模型能夠生成具有20K+點且具有細緻細節的高質量密集點雲。為了在多視角深度中實現3D一致性,我們引入了一個對應到其相鄰視圖的視線線段注意力,以條件化視圖的去噪步驟。此外,還將深度融合模塊納入擴散步驟中,進一步確保深度圖的對齊。當與表面重建相結合時,MVDD還可以生成高質量的3D網格。此外,MVDD在深度完成等其他任務中表現突出,並且可以作為3D先驗,顯著提升許多下游任務,如GAN反演。通過大量實驗展示的最新成果顯示MVDD在3D形狀生成、深度完成方面具有出色的能力,以及作為下游任務的3D先驗的潛力。
English
Denoising diffusion models have demonstrated outstanding results in 2D image
generation, yet it remains a challenge to replicate its success in 3D shape
generation. In this paper, we propose leveraging multi-view depth, which
represents complex 3D shapes in a 2D data format that is easy to denoise. We
pair this representation with a diffusion model, MVDD, that is capable of
generating high-quality dense point clouds with 20K+ points with fine-grained
details. To enforce 3D consistency in multi-view depth, we introduce an
epipolar line segment attention that conditions the denoising step for a view
on its neighboring views. Additionally, a depth fusion module is incorporated
into diffusion steps to further ensure the alignment of depth maps. When
augmented with surface reconstruction, MVDD can also produce high-quality 3D
meshes. Furthermore, MVDD stands out in other tasks such as depth completion,
and can serve as a 3D prior, significantly boosting many downstream tasks, such
as GAN inversion. State-of-the-art results from extensive experiments
demonstrate MVDD's excellent ability in 3D shape generation, depth completion,
and its potential as a 3D prior for downstream tasks.