HiFA:具有高保真度的文本生成3D模型技術,並搭載先進的擴散引導功能。
HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance
May 30, 2023
作者: Joseph Zhu, Peiye Zhuang
cs.AI
摘要
通過優化3D模型,自動文本轉3D合成已取得顯著進展。現有方法通常依賴於預訓練的文本到圖像生成模型,例如擴散模型,為神經輻射場(NeRFs)的2D渲染提供分數並用於優化NeRFs。然而,這些方法通常由於對3D幾何的理解有限,而在多個視角上遇到藝術品和不一致性。為了解決這些限制,我們提出了使用擴散先驗重新制定優化損失的方法。此外,我們引入了一種解鎖擴散先驗潛力的新型訓練方法。為了改善3D幾何表示,我們對NeRF渲染的圖像應用輔助深度監督,並對NeRFs的密度場進行規範化。大量實驗證明了我們的方法優於先前的作品,實現了先進的照片逼真度和改善的多視角一致性。
English
Automatic text-to-3D synthesis has achieved remarkable advancements through
the optimization of 3D models. Existing methods commonly rely on pre-trained
text-to-image generative models, such as diffusion models, providing scores for
2D renderings of Neural Radiance Fields (NeRFs) and being utilized for
optimizing NeRFs. However, these methods often encounter artifacts and
inconsistencies across multiple views due to their limited understanding of 3D
geometry. To address these limitations, we propose a reformulation of the
optimization loss using the diffusion prior. Furthermore, we introduce a novel
training approach that unlocks the potential of the diffusion prior. To improve
3D geometry representation, we apply auxiliary depth supervision for
NeRF-rendered images and regularize the density field of NeRFs. Extensive
experiments demonstrate the superiority of our method over prior works,
resulting in advanced photo-realism and improved multi-view consistency.