주파수 영역에서의 가이던스를 통한 낮은 CFG 스케일에서의 고해상도 샘플링

초록

분류자 없는 지도(Classifier-Free Guidance, CFG)는 현대의 조건부 확산 모델에서 필수적인 구성 요소로 자리 잡았습니다. 실질적으로 매우 효과적이지만, CFG가 품질, 세부 사항 및 프롬프트 정렬을 향상시키는 근본적인 메커니즘은 완전히 이해되지 않고 있습니다. 본 연구에서는 주파수 영역에서 CFG의 효과를 분석함으로써 CFG에 대한 새로운 관점을 제시하며, 저주파와 고주파가 생성 품질에 각각 다른 영향을 미친다는 것을 보여줍니다. 구체적으로, 저주파 지도는 전역 구조와 조건 정렬을 주도하는 반면, 고주파 지도는 주로 시각적 충실도를 향상시킵니다. 그러나 모든 주파수에 동일한 스케일을 적용하는 표준 CFG 방식은 높은 스케일에서 과포화 및 다양성 감소를 초래하고, 낮은 스케일에서는 시각적 품질이 저하되는 문제를 야기합니다. 이러한 통찰을 바탕으로, 본 연구에서는 CFG를 저주파와 고주파 구성 요소로 분리하고 각 구성 요소에 별도의 지도 강도를 적용하는 주파수 분리 지도(Frequency-Decoupled Guidance, FDG)를 제안합니다. FDG는 낮은 지도 스케일에서 이미지 품질을 개선하고, 높은 CFG 스케일의 단점을 설계 상 회피합니다. 다양한 데이터셋과 모델을 대상으로 한 광범위한 실험을 통해, FDG가 샘플 충실도를 일관되게 향상시키면서도 다양성을 유지하여 CFG 대비 개선된 FID(Fréchet Inception Distance)와 리콜(Recall)을 달성함을 입증하였습니다. 이를 통해 본 방법은 표준 분류자 없는 지도의 플러그 앤 플레이 대안으로 자리 잡았습니다.

English

Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and high-frequency components and applies separate guidance strengths to each component. FDG improves image quality at low guidance scales and avoids the drawbacks of high CFG scales by design. Through extensive experiments across multiple datasets and models, we demonstrate that FDG consistently enhances sample fidelity while preserving diversity, leading to improved FID and recall compared to CFG, establishing our method as a plug-and-play alternative to standard classifier-free guidance.

주파수 영역에서의 가이던스를 통한 낮은 CFG 스케일에서의 고해상도 샘플링

Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales

초록

Support