DMV3D: 3D 대규모 재구성 모델을 활용한 다중 뷰 디노이징 확산

초록

우리는 트랜스포머 기반의 3D 대형 재구성 모델을 사용하여 다중 뷰 디퓨전을 잡음 제거하는 새로운 3D 생성 접근법인 DMV3D를 제안합니다. 우리의 재구성 모델은 트라이플레인 NeRF 표현을 통합하고 있으며, NeRF 재구성 및 렌더링을 통해 잡음이 있는 다중 뷰 이미지를 잡음 제거할 수 있어 단일 A100 GPU에서 약 30초 내에 단일 단계 3D 생성을 달성합니다. 우리는 DMV3D를 대규모 다중 뷰 이미지 데이터셋에서 매우 다양한 객체를 대상으로 3D 자산에 접근하지 않고도 이미지 재구성 손실만을 사용하여 학습시켰습니다. 우리는 보이지 않는 객체 부분에 대한 확률적 모델링이 필요한 단일 이미지 재구성 문제에서 선명한 질감을 가진 다양한 재구성을 생성하는 최첨단 결과를 보여줍니다. 또한, 이전의 3D 디퓨전 모델을 능가하는 고품질의 텍스트-투-3D 생성 결과를 보여줍니다. 우리의 프로젝트 웹사이트는 https://justimyhxu.github.io/projects/dmv3d/ 에 있습니다.

English

We propose DMV3D, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in sim30s on single A100 GPU. We train DMV3D on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .

DMV3D: 3D 대규모 재구성 모델을 활용한 다중 뷰 디노이징 확산

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

초록

Support