Het ophelderen van de SNR-t bias van diffusion probabilistische modellen

Samenvatting

Diffusion Probabilistische Modellen hebben opmerkelijke prestaties geleverd bij een breed scala aan generatieve taken. Wij hebben echter geobserveerd dat deze modellen vaak last hebben van een Signaal-Ruisverhouding-tijdstap (SRV-t) bias. Deze bias verwijst naar de verkeerde afstemming tussen de SRV van het ontruisende sample en de bijbehorende tijdstap tijdens de inferentiefase. Specifiek is tijdens de training de SRV van een sample strikt gekoppeld aan zijn tijdstap. Deze correspondentie wordt echter verstoord tijdens de inferentie, wat leidt tot foutaccumulatie en de generatiekwaliteit aantast. Wij presenteren uitgebreid empirisch bewijs en een theoretische analyse om dit fenomeen te staven en stellen een eenvoudige maar effectieve differentiële correctiemethode voor om de SRV-t bias te verminderen. Omdat we erkennen dat diffusiemodellen typisch eerst de laagfrequente componenten reconstrueren voordat ze zich richten op hoogfrequente details tijdens het omgekeerde ontruisingsproces, ontbinden we samples in verschillende frequentiecomponenten en passen we differentiële correctie toe op elke component afzonderlijk. Uitgebreide experimenten tonen aan dat onze aanpak de generatiekwaliteit van verschillende diffusiemodellen (IDDPM, ADM, DDIM, A-DPM, EA-DPM, EDM, PFGM++ en FLUX) aanzienlijk verbetert op datasets met uiteenlopende resoluties, met verwaarloosbare rekenkosten. De code is beschikbaar op https://github.com/AMAP-ML/DCW.

English

Diffusion Probabilistic Models have demonstrated remarkable performance across a wide range of generative tasks. However, we have observed that these models often suffer from a Signal-to-Noise Ratio-timestep (SNR-t) bias. This bias refers to the misalignment between the SNR of the denoising sample and its corresponding timestep during the inference phase. Specifically, during training, the SNR of a sample is strictly coupled with its timestep. However, this correspondence is disrupted during inference, leading to error accumulation and impairing the generation quality. We provide comprehensive empirical evidence and theoretical analysis to substantiate this phenomenon and propose a simple yet effective differential correction method to mitigate the SNR-t bias. Recognizing that diffusion models typically reconstruct low-frequency components before focusing on high-frequency details during the reverse denoising process, we decompose samples into various frequency components and apply differential correction to each component individually. Extensive experiments show that our approach significantly improves the generation quality of various diffusion models (IDDPM, ADM, DDIM, A-DPM, EA-DPM, EDM, PFGM++, and FLUX) on datasets of various resolutions with negligible computational overhead. The code is at https://github.com/AMAP-ML/DCW.

Het ophelderen van de SNR-t bias van diffusion probabilistische modellen

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Samenvatting

Support