DreamTime：一種改進的優化策略，用於文本轉3D內容創建

摘要

最近，預先訓練於數十億張圖像-文本配對的文本到圖像擴散模型已經使得透過優化隨機初始化的神經輻射場（NeRF）與分數蒸餾，實現了文本到3D內容的創作。然而，所得到的3D模型存在兩個限制：（a）品質問題，如飽和色彩和雙面問題；（b）與文本引導的圖像合成相比，多樣性極低。本文指出，NeRF優化過程與分數蒸餾中均勻時間步採樣之間的衝突是這些限制的主要原因。為了解決這一衝突，我們提出將時間步採樣優先考慮為單調非遞減函數，這樣可以使NeRF優化與擴散模型的採樣過程保持一致。大量實驗表明，我們的簡單重新設計顯著改善了文本到3D內容的創作，提高了品質和多樣性。

English

Text-to-image diffusion models pre-trained on billions of image-text pairs have recently enabled text-to-3D content creation by optimizing a randomly initialized Neural Radiance Fields (NeRF) with score distillation. However, the resultant 3D models exhibit two limitations: (a) quality concerns such as saturated color and the Janus problem; (b) extremely low diversity comparing to text-guided image synthesis. In this paper, we show that the conflict between NeRF optimization process and uniform timestep sampling in score distillation is the main reason for these limitations. To resolve this conflict, we propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns NeRF optimization with the sampling process of diffusion model. Extensive experiments show that our simple redesign significantly improves text-to-3D content creation with higher quality and diversity.

DreamTime：一種改進的優化策略，用於文本轉3D內容創建

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

摘要

Support