HiFA：具有先进扩散引导的高保真文本到3D转换

摘要

通过优化3D模型，自动文本到3D合成取得了显著进展。现有方法通常依赖于预训练的文本到图像生成模型，如扩散模型，为神经辐射场（NeRFs）的2D渲染提供分数，并用于优化NeRFs。然而，由于对3D几何的理解有限，这些方法经常在多个视图之间遇到伪影和不一致性。为了解决这些限制，我们提出了使用扩散先验重新制定优化损失的方法。此外，我们引入了一种解锁扩散先验潜力的新型训练方法。为了改善3D几何表示，我们对NeRF渲染图像应用辅助深度监督，并对NeRFs的密度场进行正则化。大量实验证明了我们的方法优于先前的工作，实现了先进的照片逼真度和改善的多视图一致性。

English

Automatic text-to-3D synthesis has achieved remarkable advancements through the optimization of 3D models. Existing methods commonly rely on pre-trained text-to-image generative models, such as diffusion models, providing scores for 2D renderings of Neural Radiance Fields (NeRFs) and being utilized for optimizing NeRFs. However, these methods often encounter artifacts and inconsistencies across multiple views due to their limited understanding of 3D geometry. To address these limitations, we propose a reformulation of the optimization loss using the diffusion prior. Furthermore, we introduce a novel training approach that unlocks the potential of the diffusion prior. To improve 3D geometry representation, we apply auxiliary depth supervision for NeRF-rendered images and regularize the density field of NeRFs. Extensive experiments demonstrate the superiority of our method over prior works, resulting in advanced photo-realism and improved multi-view consistency.

HiFA：具有先进扩散引导的高保真文本到3D转换

HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance

摘要

Support