常見的擴散噪聲時間表和樣本步驟存在缺陷。

摘要

我們發現常見的擴散噪聲時間表並未強制最後一時間步具有零信噪比（SNR），而一些擴散取樣器的實作並未從最後一時間步開始。這樣的設計存在缺陷，並未反映模型在推論時受到純高斯噪聲的事實，造成訓練與推論之間存在差異。我們展示了這種有缺陷的設計在現有實作中導致了真實問題。在穩定擴散中，這嚴重限制了模型僅能生成中等亮度的圖像，並阻止其生成非常明亮和暗的樣本。我們提出了一些簡單的修正：（1）重新調整噪聲時間表以強制零終端SNR；（2）用v預測訓練模型；（3）更改取樣器以始終從最後一時間步開始；（4）重新調整無分類器引導以防止過曝。這些簡單的改變確保了訓練和推論之間擴散過程的一致性，並使模型能夠生成更符合原始數據分佈的樣本。

English

We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.

常見的擴散噪聲時間表和樣本步驟存在缺陷。

Common Diffusion Noise Schedules and Sample Steps are Flawed

摘要

Support