HiFA：高度な拡散ガイダンスによる高忠実度テキストから3Dへの生成

要旨

自動テキストから3Dへの合成は、3Dモデルの最適化を通じて顕著な進歩を遂げてきました。既存の手法では、拡散モデルなどの事前学習済みテキストから画像への生成モデルに依存し、Neural Radiance Fields（NeRF）の2Dレンダリングに対するスコアを提供し、NeRFの最適化に利用されることが一般的です。しかし、これらの手法は3Dジオメトリの理解が限られているため、複数の視点間でアーティファクトや不整合が生じることがしばしばあります。これらの制限を解決するために、我々は拡散事前分布を用いた最適化損失の再定式化を提案します。さらに、拡散事前分布の潜在能力を引き出す新しいトレーニングアプローチを導入します。3Dジオメトリ表現を改善するために、NeRFレンダリング画像に対する補助的な深度監視を適用し、NeRFの密度場を正則化します。広範な実験により、我々の手法が従来の研究を上回り、高度なフォトリアリズムと改善されたマルチビュー一貫性を実現することが示されています。

English

Automatic text-to-3D synthesis has achieved remarkable advancements through the optimization of 3D models. Existing methods commonly rely on pre-trained text-to-image generative models, such as diffusion models, providing scores for 2D renderings of Neural Radiance Fields (NeRFs) and being utilized for optimizing NeRFs. However, these methods often encounter artifacts and inconsistencies across multiple views due to their limited understanding of 3D geometry. To address these limitations, we propose a reformulation of the optimization loss using the diffusion prior. Furthermore, we introduce a novel training approach that unlocks the potential of the diffusion prior. To improve 3D geometry representation, we apply auxiliary depth supervision for NeRF-rendered images and regularize the density field of NeRFs. Extensive experiments demonstrate the superiority of our method over prior works, resulting in advanced photo-realism and improved multi-view consistency.

HiFA：高度な拡散ガイダンスによる高忠実度テキストから3Dへの生成

HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance

要旨

Support