一般的な拡散ノイズスケジュールとサンプリングステップには欠陥がある

要旨

一般的な拡散モデルのノイズスケジュールでは、最終タイムステップで信号対雑音比（SNR）がゼロになることが保証されておらず、また一部の拡散サンプラーの実装では、最終タイムステップから開始されていないことがわかりました。このような設計は欠陥があり、推論時にモデルが純粋なガウスノイズを与えられるという事実を反映しておらず、学習と推論の間に不一致を生じさせます。この欠陥のある設計が既存の実装において実際に問題を引き起こすことを示します。Stable Diffusionでは、モデルが中程度の明るさの画像しか生成できず、非常に明るいまたは暗いサンプルの生成が妨げられています。私たちは、いくつかの簡単な修正を提案します：（1）ノイズスケジュールを再スケーリングして、最終SNRがゼロになるようにする；（2）v予測を用いてモデルを学習する；（3）サンプラーを常に最終タイムステップから開始するように変更する；（4）分類器不要ガイダンスを再スケーリングして、露出過剰を防ぐ。これらの簡単な変更により、拡散プロセスが学習と推論の間で整合性を保ち、モデルが元のデータ分布により忠実なサンプルを生成できるようになります。

English

We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.

一般的な拡散ノイズスケジュールとサンプリングステップには欠陥がある

Common Diffusion Noise Schedules and Sample Steps are Flawed

要旨

Support