MVDream: Diffusione multi-vista per la generazione 3D

Abstract

Proponiamo MVDream, un modello di diffusione multi-vista in grado di generare immagini multi-vista geometricamente coerenti a partire da un prompt testuale. Sfruttando modelli di diffusione di immagini pre-addestrati su grandi dataset web e un dataset multi-vista renderizzato da asset 3D, il modello di diffusione multi-vista risultante può raggiungere sia la generalizzabilità della diffusione 2D che la coerenza dei dati 3D. Un tale modello può quindi essere applicato come prior multi-vista per la generazione 3D tramite Score Distillation Sampling, dove migliora notevolmente la stabilità dei metodi esistenti di sollevamento 2D risolvendo il problema della coerenza 3D. Infine, dimostriamo che il modello di diffusione multi-vista può anche essere fine-tuned in un contesto few-shot per la generazione 3D personalizzata, ovvero l'applicazione DreamBooth3D, dove la coerenza può essere mantenuta dopo aver appreso l'identità del soggetto.

English

We propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream: Diffusione multi-vista per la generazione 3D

MVDream: Multi-view Diffusion for 3D Generation

Abstract

Support