常见的扩散噪声计划和样本步骤存在缺陷。

摘要

我们发现常见的扩散噪声计划未强制最后一个时间步具有零信噪比（SNR），而一些扩散采样器的实现并未从最后一个时间步开始。这样的设计存在缺陷，未反映模型在推断时接收纯高斯噪声的事实，导致训练与推断之间存在差异。我们展示了这种有缺陷的设计在现有实现中引起了实际问题。在稳定扩散中，它严重限制了模型仅生成亮度适中的图像，并阻止其生成非常明亮和暗的样本。我们提出了一些简单的修复方法：（1）重新调整噪声计划以强制零终端SNR；（2）用v预测训练模型；（3）更改采样器以始终从最后一个时间步开始；（4）重新调整无分类器指导以防止过曝光。这些简单的更改确保了训练和推断之间扩散过程的一致性，并使模型能够生成更符合原始数据分布的样本。

English

We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.

常见的扩散噪声计划和样本步骤存在缺陷。

Common Diffusion Noise Schedules and Sample Steps are Flawed

摘要

Support