MVD^2：マルチビューディフュージョンのための効率的なマルチビュー3D再構成

要旨

有望な3D生成技術として、マルチビューディフュージョン（MVD）は、その汎用性、品質、効率性の点で多くの注目を集めています。事前学習済みの大規模画像ディフュージョンモデルを3Dデータでファインチューニングすることにより、MVD手法はまず画像やテキストプロンプトに基づいて3Dオブジェクトの複数のビューを生成し、その後マルチビュー3D再構成によって3D形状を再構築します。しかし、生成された画像のスパースなビューと一貫性のない詳細は、3D再構成を困難にします。本論文では、マルチビューディフュージョン（MVD）画像のための効率的な3D再構築手法であるMVD^2を提案します。MVD^2は、投影と畳み込みによって画像特徴を3D特徴ボリュームに集約し、その後ボリューム特徴を3Dメッシュにデコードします。MVD^2を、3D形状コレクションと3D形状のレンダリングビューによってプロンプトされたMVD画像を用いて学習させます。生成されたマルチビュー画像と3D形状のグラウンドトゥルースビューとの不一致に対処するために、シンプルでありながら効率的なビュー依存の学習スキームを設計します。MVD^2は、MVDの3D生成品質を向上させ、高速であり、さまざまなMVD手法に対してロバストです。学習後、マルチビュー画像から1秒以内に効率的に3Dメッシュをデコードできます。Zero-123++とObjectVerse-LVIS 3Dデータセットを用いてMVD^2を学習させ、異なるMVD手法によって生成されたマルチビュー画像から3Dモデルを生成する際の優れた性能を、合成画像と実画像の両方のプロンプトを用いて実証します。

English

As a promising 3D generation technique, multiview diffusion (MVD) has received a lot of attention due to its advantages in terms of generalizability, quality, and efficiency. By finetuning pretrained large image diffusion models with 3D data, the MVD methods first generate multiple views of a 3D object based on an image or text prompt and then reconstruct 3D shapes with multiview 3D reconstruction. However, the sparse views and inconsistent details in the generated images make 3D reconstruction challenging. We present MVD^2, an efficient 3D reconstruction method for multiview diffusion (MVD) images. MVD^2 aggregates image features into a 3D feature volume by projection and convolution and then decodes volumetric features into a 3D mesh. We train MVD^2 with 3D shape collections and MVD images prompted by rendered views of 3D shapes. To address the discrepancy between the generated multiview images and ground-truth views of the 3D shapes, we design a simple-yet-efficient view-dependent training scheme. MVD^2 improves the 3D generation quality of MVD and is fast and robust to various MVD methods. After training, it can efficiently decode 3D meshes from multiview images within one second. We train MVD^2 with Zero-123++ and ObjectVerse-LVIS 3D dataset and demonstrate its superior performance in generating 3D models from multiview images generated by different MVD methods, using both synthetic and real images as prompts.

MVD^2：マルチビューディフュージョンのための効率的なマルチビュー3D再構成

MVD^2: Efficient Multiview 3D Reconstruction for Multiview Diffusion

要旨

Support