DreamTime: テキストから3Dコンテンツ生成のための改良された最適化戦略

要旨

数十億の画像-テキストペアで事前学習されたテキスト-to-画像拡散モデルは、最近、スコア蒸留を用いてランダムに初期化されたNeural Radiance Fields（NeRF）を最適化することで、テキスト-to-3Dコンテンツ生成を可能にしました。しかし、生成された3Dモデルには2つの制限があります：(a) 色の飽和やJanus問題などの品質上の懸念、(b) テキストガイド付き画像合成と比較して極端に低い多様性です。本論文では、NeRF最適化プロセスとスコア蒸留における均一なタイムステップサンプリングの間の矛盾が、これらの制限の主な原因であることを示します。この矛盾を解決するために、単調非増加関数を用いてタイムステップサンプリングを優先することを提案します。これにより、NeRF最適化が拡散モデルのサンプリングプロセスと整合します。大規模な実験により、このシンプルな再設計が、より高品質で多様性のあるテキスト-to-3Dコンテンツ生成を大幅に改善することを示します。

English

Text-to-image diffusion models pre-trained on billions of image-text pairs have recently enabled text-to-3D content creation by optimizing a randomly initialized Neural Radiance Fields (NeRF) with score distillation. However, the resultant 3D models exhibit two limitations: (a) quality concerns such as saturated color and the Janus problem; (b) extremely low diversity comparing to text-guided image synthesis. In this paper, we show that the conflict between NeRF optimization process and uniform timestep sampling in score distillation is the main reason for these limitations. To resolve this conflict, we propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns NeRF optimization with the sampling process of diffusion model. Extensive experiments show that our simple redesign significantly improves text-to-3D content creation with higher quality and diversity.

DreamTime: テキストから3Dコンテンツ生成のための改良された最適化戦略

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

要旨

Support