常見的擴散噪聲時間表和樣本步驟存在缺陷。
Common Diffusion Noise Schedules and Sample Steps are Flawed
May 15, 2023
作者: Shanchuan Lin, Bingchen Liu, Jiashi Li, Xiao Yang
cs.AI
摘要
我們發現常見的擴散噪聲時間表並未強制最後一時間步具有零信噪比(SNR),而一些擴散取樣器的實作並未從最後一時間步開始。這樣的設計存在缺陷,並未反映模型在推論時受到純高斯噪聲的事實,造成訓練與推論之間存在差異。我們展示了這種有缺陷的設計在現有實作中導致了真實問題。在穩定擴散中,這嚴重限制了模型僅能生成中等亮度的圖像,並阻止其生成非常明亮和暗的樣本。我們提出了一些簡單的修正:(1)重新調整噪聲時間表以強制零終端SNR;(2)用v預測訓練模型;(3)更改取樣器以始終從最後一時間步開始;(4)重新調整無分類器引導以防止過曝。這些簡單的改變確保了訓練和推論之間擴散過程的一致性,並使模型能夠生成更符合原始數據分佈的樣本。
English
We discover that common diffusion noise schedules do not enforce the last
timestep to have zero signal-to-noise ratio (SNR), and some implementations of
diffusion samplers do not start from the last timestep. Such designs are flawed
and do not reflect the fact that the model is given pure Gaussian noise at
inference, creating a discrepancy between training and inference. We show that
the flawed design causes real problems in existing implementations. In Stable
Diffusion, it severely limits the model to only generate images with medium
brightness and prevents it from generating very bright and dark samples. We
propose a few simple fixes: (1) rescale the noise schedule to enforce zero
terminal SNR; (2) train the model with v prediction; (3) change the sampler to
always start from the last timestep; (4) rescale classifier-free guidance to
prevent over-exposure. These simple changes ensure the diffusion process is
congruent between training and inference and allow the model to generate
samples more faithful to the original data distribution.