MVDream：3D生成のためのマルチビューディフュージョン

要旨

私たちはMVDreamを提案します。これは、与えられたテキストプロンプトから幾何学的に一貫したマルチビュー画像を生成できるマルチビューディフュージョンモデルです。大規模なウェブデータセットで事前学習された画像ディフュージョンモデルと、3Dアセットからレンダリングされたマルチビューデータセットを活用することで、このマルチビューディフュージョンモデルは、2Dディフュージョンの汎用性と3Dデータの一貫性の両方を実現できます。このようなモデルは、スコア蒸留サンプリングを介した3D生成のためのマルチビュープライアとして適用でき、3D一貫性の問題を解決することで、既存の2Dリフティング手法の安定性を大幅に向上させます。最後に、このマルチビューディフュージョンモデルが、少数ショット設定でファインチューニング可能であり、パーソナライズされた3D生成（すなわちDreamBooth3Dアプリケーション）に適用できることを示します。この場合、被写体のアイデンティティを学習した後も一貫性を維持できます。

English

We propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream：3D生成のためのマルチビューディフュージョン

MVDream: Multi-view Diffusion for 3D Generation

要旨

Support