MVDD: マルチビュー深度拡散モデル

要旨

ノイズ除去拡散モデルは2D画像生成において優れた結果を示していますが、その成功を3D形状生成に再現することは依然として課題です。本論文では、複雑な3D形状をノイズ除去しやすい2Dデータ形式で表現するマルチビューデプスを活用することを提案します。この表現を拡散モデルMVDDと組み合わせることで、20,000点以上の高密度点群を細部まで高品質に生成することが可能です。マルチビューデプスの3D一貫性を強化するために、エピポーラ線分アテンションを導入し、あるビューのノイズ除去ステップを隣接ビューに条件付けします。さらに、拡散ステップに深度融合モジュールを組み込むことで、深度マップの整合性をさらに確保します。表面再構成を組み合わせることで、MVDDは高品質な3Dメッシュも生成できます。さらに、MVDDは深度補完などの他のタスクでも優れており、3D事前分布として機能することで、GAN逆変換などの多くの下流タスクを大幅に向上させることができます。広範な実験による最先端の結果は、MVDDの3D形状生成、深度補完における優れた能力、および下流タスクのための3D事前分布としての潜在的可能性を示しています。

English

Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality dense point clouds with 20K+ points with fine-grained details. To enforce 3D consistency in multi-view depth, we introduce an epipolar line segment attention that conditions the denoising step for a view on its neighboring views. Additionally, a depth fusion module is incorporated into diffusion steps to further ensure the alignment of depth maps. When augmented with surface reconstruction, MVDD can also produce high-quality 3D meshes. Furthermore, MVDD stands out in other tasks such as depth completion, and can serve as a 3D prior, significantly boosting many downstream tasks, such as GAN inversion. State-of-the-art results from extensive experiments demonstrate MVDD's excellent ability in 3D shape generation, depth completion, and its potential as a 3D prior for downstream tasks.

MVDD: マルチビュー深度拡散モデル

MVDD: Multi-View Depth Diffusion Models

要旨

Support