在频域引导下实现低CFG尺度下的高保真采样

摘要

无分类器引导（CFG）已成为现代条件扩散模型的关键组成部分。尽管在实践中效果显著，但CFG提升生成质量、细节及提示对齐的内在机制尚未完全明晰。本文通过频域分析，为CFG提供了一种新颖视角，揭示了低频与高频对生成质量的不同影响。具体而言，低频引导主导全局结构与条件对齐，而高频引导则主要增强视觉保真度。然而，如标准CFG那样对所有频率统一施加缩放因子，会导致在高缩放比例下出现过饱和与多样性降低，在低缩放比例下则视觉质量受损。基于这些洞察，我们提出了频率解耦引导（FDG），这一有效方法将CFG分解为低频与高频成分，并分别对每部分施加独立的引导强度。FDG在低引导比例下提升了图像质量，并从根本上规避了高CFG比例下的弊端。通过跨多个数据集与模型的广泛实验，我们证实FDG在保持多样性的同时，持续提升了样本保真度，相较于CFG，实现了FID与召回率的改进，确立了该方法作为标准无分类器引导即插即用替代方案的地位。

English

Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and high-frequency components and applies separate guidance strengths to each component. FDG improves image quality at low guidance scales and avoids the drawbacks of high CFG scales by design. Through extensive experiments across multiple datasets and models, we demonstrate that FDG consistently enhances sample fidelity while preserving diversity, leading to improved FID and recall compared to CFG, establishing our method as a plug-and-play alternative to standard classifier-free guidance.

在频域引导下实现低CFG尺度下的高保真采样

Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales

摘要

Support