일반적인 확산 노이즈 스케줄과 샘플링 단계는 결함이 있다

초록

우리는 일반적인 확산 노이즈 스케줄이 마지막 타임스텝에서 신호 대 잡음비(SNR)를 0으로 강제하지 않으며, 일부 확산 샘플러 구현이 마지막 타임스텝에서 시작하지 않는다는 사실을 발견했습니다. 이러한 설계는 결함이 있으며, 모델이 추론 시 순수 가우시안 노이즈를 받는다는 사실을 반영하지 않아 훈련과 추론 간의 불일치를 초래합니다. 우리는 이러한 결함이 기존 구현에서 실제 문제를 일으킨다는 것을 보여줍니다. Stable Diffusion에서는 이로 인해 모델이 중간 밝기의 이미지만 생성하도록 제한되고, 매우 밝거나 어두운 샘플을 생성하지 못하게 됩니다. 우리는 몇 가지 간단한 수정 사항을 제안합니다: (1) 노이즈 스케줄을 재조정하여 최종 SNR을 0으로 강제; (2) v 예측으로 모델을 훈련; (3) 샘플러가 항상 마지막 타임스텝에서 시작하도록 변경; (4) 과도한 노출을 방지하기 위해 분류자 없는 가이던스를 재조정. 이러한 간단한 변경 사항들은 확산 과정이 훈련과 추론 간에 일관되도록 보장하며, 모델이 원본 데이터 분포에 더 충실한 샘플을 생성할 수 있게 합니다.

English

We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.

일반적인 확산 노이즈 스케줄과 샘플링 단계는 결함이 있다

Common Diffusion Noise Schedules and Sample Steps are Flawed

초록

Support