HiFA：具有高保真度的文本生成3D模型技術，並搭載先進的擴散引導功能。

摘要

通過優化3D模型，自動文本轉3D合成已取得顯著進展。現有方法通常依賴於預訓練的文本到圖像生成模型，例如擴散模型，為神經輻射場（NeRFs）的2D渲染提供分數並用於優化NeRFs。然而，這些方法通常由於對3D幾何的理解有限，而在多個視角上遇到藝術品和不一致性。為了解決這些限制，我們提出了使用擴散先驗重新制定優化損失的方法。此外，我們引入了一種解鎖擴散先驗潛力的新型訓練方法。為了改善3D幾何表示，我們對NeRF渲染的圖像應用輔助深度監督，並對NeRFs的密度場進行規範化。大量實驗證明了我們的方法優於先前的作品，實現了先進的照片逼真度和改善的多視角一致性。

English

Automatic text-to-3D synthesis has achieved remarkable advancements through the optimization of 3D models. Existing methods commonly rely on pre-trained text-to-image generative models, such as diffusion models, providing scores for 2D renderings of Neural Radiance Fields (NeRFs) and being utilized for optimizing NeRFs. However, these methods often encounter artifacts and inconsistencies across multiple views due to their limited understanding of 3D geometry. To address these limitations, we propose a reformulation of the optimization loss using the diffusion prior. Furthermore, we introduce a novel training approach that unlocks the potential of the diffusion prior. To improve 3D geometry representation, we apply auxiliary depth supervision for NeRF-rendered images and regularize the density field of NeRFs. Extensive experiments demonstrate the superiority of our method over prior works, resulting in advanced photo-realism and improved multi-view consistency.

HiFA：具有高保真度的文本生成3D模型技術，並搭載先進的擴散引導功能。

HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance

摘要

Support