MVDream: 3D 생성을 위한 멀티뷰 디퓨전

초록

우리는 주어진 텍스트 프롬프트로부터 기하학적으로 일관된 다중 뷰 이미지를 생성할 수 있는 MVDream이라는 다중 뷰 확산 모델을 제안한다. 대규모 웹 데이터셋으로 사전 학습된 이미지 확산 모델과 3D 자산에서 렌더링된 다중 뷰 데이터셋을 활용함으로써, 결과적으로 얻은 다중 뷰 확산 모델은 2D 확산의 일반화 능력과 3D 데이터의 일관성을 모두 달성할 수 있다. 이러한 모델은 Score Distillation Sampling을 통한 3D 생성에서 다중 뷰 사전 지식으로 적용될 수 있으며, 3D 일관성 문제를 해결함으로써 기존 2D 리프팅 방법의 안정성을 크게 향상시킨다. 마지막으로, 다중 뷰 확산 모델이 소수 샷 설정 하에서도 개인화된 3D 생성, 즉 DreamBooth3D 애플리케이션을 위해 미세 조정될 수 있음을 보여준다. 이 경우, 주체의 정체성을 학습한 후에도 일관성을 유지할 수 있다.

English

We propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream: 3D 생성을 위한 멀티뷰 디퓨전

MVDream: Multi-view Diffusion for 3D Generation

초록

Support