DMV3D：使用3D大型重建模型進行去噪多視角擴散

摘要

我們提出了DMV3D，一種新穎的3D生成方法，它使用基於Transformer的3D大型重建模型來去噪多視圖擴散。我們的重建模型融合了三平面NeRF表示法，可以通過NeRF重建和渲染去噪多視圖圖像，實現在單個A100 GPU上的單階段3D生成，耗時約30秒。我們在大規模多視圖圖像數據集上訓練DMV3D，這些數據集包含高度多樣化的物體，僅使用圖像重建損失，而無需訪問3D資產。我們展示了在需要對看不見的物體部分進行概率建模以生成具有清晰紋理的多樣重建的單圖重建問題上的最新成果。我們還展示了高質量的文本到3D生成結果，勝過先前的3D擴散模型。我們的項目網站位於：https://justimyhxu.github.io/projects/dmv3d/。

English

We propose DMV3D, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in sim30s on single A100 GPU. We train DMV3D on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .

DMV3D：使用3D大型重建模型進行去噪多視角擴散

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

摘要

Support