ChatPaper.aiChatPaper

有色噪声扩散采样

Colored Noise Diffusion Sampling

May 28, 2026
作者: Hadar Davidson, Noam Issachar, Sagie Benaim
cs.AI

摘要

扩散模型在图像合成领域达到了最先进的水平,其生成轨迹本质上表现出谱偏差:早期解析低频全局结构,后期处理高频细节。传统的随机微分方程(SDE)求解器未能考虑这种动态特性,在整个过程中朴素地注入均匀白噪声,从而浪费了有限的能量预算。本文建立了一个数学框架,将SDE推断重新视为一种有针对性的、频率解耦的能量转移过程。基于此框架,我们提出了一种免训练的新型随机求解器——彩色噪声采样(CNS)。CNS不注入均匀白噪声,而是采用一种动态的、与时间步和频率相关的调度策略,将注入能量更高效地分配到结构未解析的频带。通过主动利用模型固有的谱偏差,CNS系统地引导生成分布向真实数据流形靠近。大量实验表明,CNS作为严格的即插即用推理阶段采样器替代方案,在多种架构(SiT、JiT、FLUX)上显著优于标准ODE和SDE基线。在ImageNet-256上,与标准采样相比,CNS实现了无引导FID的大幅降低:SiT-XL/2从8.26降至6.27,JiT-B/16从32.39降至26.69,JiT-H/16从11.88降至8.31,同时在无分类器引导下取得了一致的相对FID改进。项目页面:https://hadardavidson.github.io/CNS/。
English
Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solvers fail to account for this dynamic, naively injecting uniform white noise throughout the entire process and misusing the finite energy budget. In this work, we establish a mathematical framework that reconsiders SDE inference as a targeted, frequency-decoupled energy transfer. Leveraging this framework, we introduce Colored Noise Sampling (CNS), a novel, training-free stochastic solver. Rather than injecting uniform white noise, CNS utilizes a dynamic, timestep- and frequency-dependent schedule that more efficiently allocates injected energy toward structurally unresolved frequency bands. By actively exploiting the model's inherent spectral bias, CNS systematically steers the generated distribution toward the true data manifold. Extensive experiments demonstrate that CNS significantly outperforms standard ODE and SDE baselines as a strictly plug-and-play, inference-time sampler substitution across diverse architectures (SiT, JiT, FLUX). Compared to standard sampling on ImageNet-256, CNS achieves substantial unguided FID reductions, improving from 8.26 to 6.27 on SiT-XL/2, 32.39 to 26.69 on JiT-B/16, and 11.88 to 8.31 on JiT-H/16, while yielding consistent relative FID improvements with Classifier-Free Guidance. Project page is available at https://hadardavidson.github.io/CNS/.