稳定梦想者：驯服文本到3D的嘈杂分数蒸馏采样

摘要

在文本转3D生成领域，通过评分蒸馏采样（SDS）利用2D扩散模型经常会导致问题，如模糊外观和多面几何，主要是由于SDS损失的固有噪声特性。我们的分析确定了这些挑战的核心，即2D扩散过程中噪声水平、扩散网络架构和3D模型表示之间的相互作用。为了克服这些限制，我们提出了StableDreamer，这是一种结合了三项进展的方法。首先，受InstructNeRF2NeRF启发，我们明确了SDS生成先验与简单监督L2重建损失的等效性。这一发现提供了一种新的调试SDS的工具，我们利用这一工具展示了时间退火噪声水平对减少多面几何的影响。其次，我们的分析表明，虽然图像空间扩散有助于几何精度，但潜在空间扩散对生动的颜色呈现至关重要。基于这一观察，StableDreamer引入了一个两阶段训练策略，有效地结合了这些方面，从而产生高保真的3D模型。第三，我们采用各向异性3D高斯表示，取代神经辐射场（NeRFs），以提高整体质量，在训练期间减少内存使用，并加快渲染速度，更好地捕捉半透明物体。StableDreamer减少了多面几何，生成了精细细节，并稳定收敛。

English

In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the diffusion network, and the 3D model representation. To overcome these limitations, we present StableDreamer, a methodology incorporating three advances. First, inspired by InstructNeRF2NeRF, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. This finding provides a novel tool to debug SDS, which we use to show the impact of time-annealing noise levels on reducing multi-faced geometries. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition. Based on this observation, StableDreamer introduces a two-stage training strategy that effectively combines these aspects, resulting in high-fidelity 3D models. Third, we adopt an anisotropic 3D Gaussians representation, replacing Neural Radiance Fields (NeRFs), to enhance the overall quality, reduce memory usage during training, and accelerate rendering speeds, and better capture semi-transparent objects. StableDreamer reduces multi-face geometries, generates fine details, and converges stably.

稳定梦想者：驯服文本到3D的嘈杂分数蒸馏采样

StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

摘要

Support