ChatPaper.aiChatPaper

在頻率域中的引導使得在低CFG尺度下實現高保真採樣成為可能

Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales

June 24, 2025
作者: Seyedmorteza Sadat, Tobias Vontobel, Farnood Salehi, Romann M. Weber
cs.AI

摘要

無分類器指導(CFG)已成為現代條件擴散模型中的關鍵組成部分。儘管在實踐中極為有效,但CFG提升質量、細節和提示對齊的內在機制尚未完全被理解。我們通過在頻域中分析CFG的效果,提出了一種新穎的視角,展示了低頻和高頻對生成質量的不同影響。具體而言,低頻指導主導全局結構和條件對齊,而高頻指導主要增強視覺保真度。然而,在所有頻率上應用統一尺度——如標準CFG所做——會導致高尺度下的過度飽和與多樣性降低,以及低尺度下的視覺質量下降。基於這些洞察,我們提出了頻率解耦指導(FDG),這是一種有效的方法,將CFG分解為低頻和高頻組件,並對每個組件應用獨立的指導強度。FDG在低指導尺度下提升了圖像質量,並通過設計避免了高CFG尺度的弊端。通過在多個數據集和模型上的廣泛實驗,我們證明FDG在保持多樣性的同時持續提升樣本保真度,相較於CFG,改善了FID和召回率,確立了我們的方法作為標準無分類器指導的即插即用替代方案。
English
Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and high-frequency components and applies separate guidance strengths to each component. FDG improves image quality at low guidance scales and avoids the drawbacks of high CFG scales by design. Through extensive experiments across multiple datasets and models, we demonstrate that FDG consistently enhances sample fidelity while preserving diversity, leading to improved FID and recall compared to CFG, establishing our method as a plug-and-play alternative to standard classifier-free guidance.
PDF102June 25, 2025