Ouroboros3D: 3D認識再帰的拡散による画像から3D生成

要旨

既存の単一画像から3D生成を行う手法は、通常2段階のプロセスを採用しています。まず多視点画像を生成し、その後これらの画像を用いて3D再構成を行います。しかし、これら2つの段階を別々に学習すると、推論段階で大きなデータバイアスが生じ、再構成結果の品質に影響を及ぼします。本研究では、Ouroboros3Dと名付けた統一的な3D生成フレームワークを提案します。このフレームワークは、拡散ベースの多視点画像生成と3D再構成を再帰的拡散プロセスに統合しています。我々のフレームワークでは、これら2つのモジュールが自己条件付けメカニズムを通じて共同で学習され、互いの特性に適応してロバストな推論を可能にします。多視点ノイズ除去プロセス中、多視点拡散モデルは、前のタイムステップで再構成モジュールによってレンダリングされた3D認識マップを追加条件として使用します。3D認識フィードバックを備えた再帰的拡散フレームワークは、プロセス全体を統合し、幾何学的な一貫性を向上させます。実験結果は、我々のフレームワークがこれら2つの段階を分離した手法や、推論段階でそれらを組み合わせた既存手法を凌駕することを示しています。プロジェクトページ: https://costwen.github.io/Ouroboros3D/

English

Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. In our framework, these two modules are jointly trained through a self-conditioning mechanism, allowing them to adapt to each other's characteristics for robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D-aware maps rendered by the reconstruction module at the previous timestep as additional conditions. The recursive diffusion framework with 3D-aware feedback unites the entire process and improves geometric consistency.Experiments show that our framework outperforms separation of these two stages and existing methods that combine them at the inference phase. Project page: https://costwen.github.io/Ouroboros3D/

Ouroboros3D: 3D認識再帰的拡散による画像から3D生成

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

要旨

Support