One-2-3-45++: 一貫したマルチビュー生成と3D拡散による単一画像からの高速3Dオブジェクト生成

要旨

近年のオープンワールド3Dオブジェクト生成の進展は目覚ましく、画像から3Dへの変換手法はテキストから3Dへの手法に比べて優れた細粒度の制御を提供しています。しかし、既存のモデルの多くは、実用的なアプリケーションに不可欠な2つの特徴、すなわち迅速な生成速度と入力画像への高い忠実度を同時に提供することに課題を抱えています。本論文では、単一の画像を約1分で詳細な3Dテクスチャメッシュに変換する革新的な手法、One-2-3-45++を提案します。我々のアプローチは、2D拡散モデルに埋め込まれた広範な知識と、貴重ながらも限られた3Dデータからの事前情報を最大限に活用することを目指しています。これは、まず一貫性のあるマルチビュー画像生成のために2D拡散モデルを微調整し、その後、マルチビュー条件付きの3Dネイティブ拡散モデルの助けを借りてこれらの画像を3Dに昇格させることで実現されます。広範な実験的評価により、我々の手法が元の入力画像に極めて近い高品質で多様な3Dアセットを生成できることが実証されています。プロジェクトのウェブページはこちらです: https://sudo-ai-3d.github.io/One2345plus_page.

English

Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: https://sudo-ai-3d.github.io/One2345plus_page.

One-2-3-45++: 一貫したマルチビュー生成と3D拡散による単一画像からの高速3Dオブジェクト生成

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

要旨

Support