使用學習流形校正的分數蒸餾取樣

摘要

分數蒸餾取樣（Score Distillation Sampling，SDS）是一種最近廣受歡迎的方法，依賴圖像擴散模型來控制使用文本提示的優化問題。在本文中，我們對SDS損失函數進行了深入分析，識別了其公式中的固有問題，並提出了一個出乎意料但有效的修復方法。具體而言，我們將損失分解為不同因素，並分離出導致梯度噪音的組成部分。在原始公式中，使用高文本引導來解釋噪音，導致了不良的副作用。相反，我們訓練一個淺層網絡來模仿圖像擴散模型的時間步相依性去噪缺陷，以有效地將其因素化。我們通過多個定性和定量實驗展示了我們新型損失公式的多功能性和有效性，包括基於優化的圖像合成和編輯、零樣本圖像翻譯網絡訓練，以及文本到3D合成。

English

Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects. Instead, we train a shallow network mimicking the timestep-dependent denoising deficiency of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through several qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.

使用學習流形校正的分數蒸餾取樣

Score Distillation Sampling with Learned Manifold Corrective

摘要

Support