穩定分數蒸餾用於高品質3D生成
Stable Score Distillation for High-Quality 3D Generation
December 14, 2023
作者: Boshi Tang, Jianan Wang, Zhiyong Wu, Lei Zhang
cs.AI
摘要
分數蒸餾採樣(SDS)在條件3D內容生成方面表現出卓越的性能。然而,對SDS公式的全面理解仍然不足,阻礙了3D生成的發展。在本研究中,我們將SDS解釋為三個功能組成部分的組合:模式解耦、模式尋找和減少變異的術語,並分析每個術語的特性。我們指出,由於監督術語的固有缺陷,問題如過度平滑和色彩飽和是由SDS引起的,並揭示SDS引入的減少變異術語是次優的。此外,我們闡明了採用大型無分類器引導(CFG)尺度進行3D生成的原因。基於分析,我們提出了一種簡單而有效的方法,名為穩定分數蒸餾(SSD),該方法可以策略性地組織每個術語,以實現高質量的3D生成。大量實驗驗證了我們方法的功效,展示了其生成高保真度3D內容的能力,即使在最具挑戰性的NeRF表示條件下,也不會出現過度平滑和過度飽和等問題。
English
Score Distillation Sampling (SDS) has exhibited remarkable performance in
conditional 3D content generation. However, a comprehensive understanding of
the SDS formulation is still lacking, hindering the development of 3D
generation. In this work, we present an interpretation of SDS as a combination
of three functional components: mode-disengaging, mode-seeking and
variance-reducing terms, and analyze the properties of each. We show that
problems such as over-smoothness and color-saturation result from the intrinsic
deficiency of the supervision terms and reveal that the variance-reducing term
introduced by SDS is sub-optimal. Additionally, we shed light on the adoption
of large Classifier-Free Guidance (CFG) scale for 3D generation. Based on the
analysis, we propose a simple yet effective approach named Stable Score
Distillation (SSD) which strategically orchestrates each term for high-quality
3D generation. Extensive experiments validate the efficacy of our approach,
demonstrating its ability to generate high-fidelity 3D content without
succumbing to issues such as over-smoothness and over-saturation, even under
low CFG conditions with the most challenging NeRF representation.