StableDreamer:馴服嘈雜的分數蒸餾採樣,用於文本轉3D
StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D
December 2, 2023
作者: Pengsheng Guo, Hans Hao, Adam Caccavale, Zhongzheng Ren, Edward Zhang, Qi Shan, Aditya Sankar, Alexander G. Schwing, Alex Colburn, Fangchang Ma
cs.AI
摘要
在文本轉3D生成領域中,通過分數蒸餾取樣(SDS)利用2D擴散模型經常會導致問題,例如模糊外觀和多面幾何,主要是由於SDS損失的固有噪音特性。我們的分析確定了這些挑戰的核心,即2D擴散過程中噪音水平、擴散網絡的架構以及3D模型表示之間的交互作用。為了克服這些限制,我們提出了StableDreamer,這是一種融合了三個進展的方法。首先,受InstructNeRF2NeRF的啟發,我們正式確定了SDS生成先驗和簡單監督L2重建損失的等價性。這一發現提供了一個新的工具來調試SDS,我們用它來展示時間退火噪音水平對減少多面幾何的影響。其次,我們的分析表明,儘管圖像空間擴散有助於幾何精度,但潛在空間擴散對生動的色彩呈現至關重要。基於這一觀察結果,StableDreamer引入了一種有效結合這些方面的兩階段訓練策略,從而產生高保真度的3D模型。第三,我們採用各向異性3D高斯表示法,取代神經輻射場(NeRFs),以提高整體質量,減少訓練過程中的內存使用量,加快渲染速度,並更好地捕捉半透明物體。StableDreamer減少了多面幾何,生成了精細細節,並穩定收斂。
English
In the realm of text-to-3D generation, utilizing 2D diffusion models through
score distillation sampling (SDS) frequently leads to issues such as blurred
appearances and multi-faced geometry, primarily due to the intrinsically noisy
nature of the SDS loss. Our analysis identifies the core of these challenges as
the interaction among noise levels in the 2D diffusion process, the
architecture of the diffusion network, and the 3D model representation. To
overcome these limitations, we present StableDreamer, a methodology
incorporating three advances. First, inspired by InstructNeRF2NeRF, we
formalize the equivalence of the SDS generative prior and a simple supervised
L2 reconstruction loss. This finding provides a novel tool to debug SDS, which
we use to show the impact of time-annealing noise levels on reducing
multi-faced geometries. Second, our analysis shows that while image-space
diffusion contributes to geometric precision, latent-space diffusion is crucial
for vivid color rendition. Based on this observation, StableDreamer introduces
a two-stage training strategy that effectively combines these aspects,
resulting in high-fidelity 3D models. Third, we adopt an anisotropic 3D
Gaussians representation, replacing Neural Radiance Fields (NeRFs), to enhance
the overall quality, reduce memory usage during training, and accelerate
rendering speeds, and better capture semi-transparent objects. StableDreamer
reduces multi-face geometries, generates fine details, and converges stably.