有色噪聲擴散採樣
Colored Noise Diffusion Sampling
May 28, 2026
作者: Hadar Davidson, Noam Issachar, Sagie Benaim
cs.AI
摘要
擴散模型實現了當前最先進的影像生成技術,其生成軌跡本質上展現出頻譜偏誤:早期解析低頻全局結構,後期處理高頻細微細節。傳統隨機微分方程式(SDE)求解器未考量此動態特性,在整個過程中單純注入均勻白噪聲,導致有限能量預算的浪費。本研究建立一套數學框架,將SDE推論重新定義為具目標性、頻率解耦的能量傳遞。基於此框架,我們提出「有色噪聲取樣」(Colored Noise Sampling, CNS)——一種無需訓練的新型隨機求解器。CNS捨棄均勻白噪聲,採用隨時間步與頻率動態調整的排程,將注入能量更有效率地分配至結構尚未確立的頻帶。透過主動利用模型內在的頻譜偏誤,CNS系統性地引導生成分佈趨向真實數據流形。大量實驗證明,作為嚴格的即插即用推論期間取樣替代方案,CNS在多种架構(SiT、JiT、FLUX)中均顯著優於標準ODE與SDE基線。在ImageNet-256上,CNS實現無引導FID的大幅降低:SiT-XL/2從8.26降至6.27,JiT-B/16從32.39降至26.69,JiT-H/16從11.88降至8.31,並在分類器自由引導下保持一致的相對FID改善。專案頁面位於https://hadardavidson.github.io/CNS/。
English
Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solvers fail to account for this dynamic, naively injecting uniform white noise throughout the entire process and misusing the finite energy budget. In this work, we establish a mathematical framework that reconsiders SDE inference as a targeted, frequency-decoupled energy transfer. Leveraging this framework, we introduce Colored Noise Sampling (CNS), a novel, training-free stochastic solver. Rather than injecting uniform white noise, CNS utilizes a dynamic, timestep- and frequency-dependent schedule that more efficiently allocates injected energy toward structurally unresolved frequency bands. By actively exploiting the model's inherent spectral bias, CNS systematically steers the generated distribution toward the true data manifold. Extensive experiments demonstrate that CNS significantly outperforms standard ODE and SDE baselines as a strictly plug-and-play, inference-time sampler substitution across diverse architectures (SiT, JiT, FLUX). Compared to standard sampling on ImageNet-256, CNS achieves substantial unguided FID reductions, improving from 8.26 to 6.27 on SiT-XL/2, 32.39 to 26.69 on JiT-B/16, and 11.88 to 8.31 on JiT-H/16, while yielding consistent relative FID improvements with Classifier-Free Guidance. Project page is available at https://hadardavidson.github.io/CNS/.