Instant3D: スパースビュー生成と大規模再構成モデルによる高速テキスト-to-3D

要旨

テキストから3Dを生成する拡散モデルは、近年目覚ましい進歩を遂げています。しかし、既存の手法は、推論速度が遅く、多様性が低く、Janus問題（多面顔問題）に悩まされるスコア蒸留ベースの最適化に依存するか、3D学習データの不足により低品質な結果を生成するフィードフォワード手法に限られています。本論文では、テキストプロンプトから高品質で多様な3Dアセットをフィードフォワード方式で生成する新しい手法であるInstant3Dを提案します。私たちは2段階のパラダイムを採用し、まず微調整された2Dテキストから画像への拡散モデルを使用して、テキストから一発で4つの構造化された一貫性のあるビューを生成し、次に新たなトランスフォーマーベースのスパースビュー再構成器を使用して、生成された画像から直接NeRFを回帰します。広範な実験を通じて、私たちの手法が20秒以内に高品質で多様性があり、Janus問題のない3Dアセットを生成できることを実証しました。これは、1時間から10時間かかる従来の最適化ベースの手法よりも2桁高速です。プロジェクトのウェブページはこちらです：https://jiahao.ai/instant3d/。

English

Text-to-3D with diffusion models have achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate high-quality, diverse and Janus-free 3D assets within 20 seconds, which is two order of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.

Instant3D: スパースビュー生成と大規模再構成モデルによる高速テキスト-to-3D

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

要旨

Support