阐明扩散概率模型的信噪比时序偏置

摘要

扩散概率模型在各类生成任务中展现出卓越性能。然而我们发现，这些模型普遍存在信噪比-时间步（SNR-t）偏差问题。该偏差是指推理阶段去噪样本的信噪比与其对应时间步之间的失配现象。具体而言，在训练过程中样本的信噪比与时间步严格绑定，但这种对应关系在推理时会被破坏，导致误差累积并影响生成质量。我们通过详实的实证证据与理论分析验证了这一现象，并提出一种简单有效的差分校正方法以缓解SNR-t偏差。基于扩散模型在反向去噪过程中通常先重建低频成分再聚焦高频细节的认知，我们将样本分解为不同频率分量，并对各分量分别施加差分校正。大量实验表明，该方法在可忽略的计算开销下，显著提升了多种扩散模型（IDDPM、ADM、DDIM、A-DPM、EA-DPM、EDM、PFGM++和FLUX）在不同分辨率数据集上的生成质量。代码详见https://github.com/AMAP-ML/DCW。

English

Diffusion Probabilistic Models have demonstrated remarkable performance across a wide range of generative tasks. However, we have observed that these models often suffer from a Signal-to-Noise Ratio-timestep (SNR-t) bias. This bias refers to the misalignment between the SNR of the denoising sample and its corresponding timestep during the inference phase. Specifically, during training, the SNR of a sample is strictly coupled with its timestep. However, this correspondence is disrupted during inference, leading to error accumulation and impairing the generation quality. We provide comprehensive empirical evidence and theoretical analysis to substantiate this phenomenon and propose a simple yet effective differential correction method to mitigate the SNR-t bias. Recognizing that diffusion models typically reconstruct low-frequency components before focusing on high-frequency details during the reverse denoising process, we decompose samples into various frequency components and apply differential correction to each component individually. Extensive experiments show that our approach significantly improves the generation quality of various diffusion models (IDDPM, ADM, DDIM, A-DPM, EA-DPM, EDM, PFGM++, and FLUX) on datasets of various resolutions with negligible computational overhead. The code is at https://github.com/AMAP-ML/DCW.