FreSca：拡散モデルにおけるスケーリング空間の解明

要旨

拡散モデルは、主にタスク固有の情報をエンコードするノイズ予測と、調整可能なスケーリングを可能にするクラスフリーフリーガイダンスを通じて、画像タスクにおいて印象的な制御性を提供します。このスケーリングメカニズムは、微細な意味操作の可能性が未開拓の「スケーリング空間」を暗黙的に定義します。我々はこの空間を調査し、条件付き/無条件ノイズ予測間の差が重要な意味情報を運ぶ逆変換ベースの編集から始めます。我々の核心的な貢献は、ノイズ予測のフーリエ解析から得られ、その低周波数成分と高周波数成分が拡散過程を通じて異なる進化を示すことを明らかにします。この洞察に基づき、我々はFreScaを導入します。これは、フーリエ領域の異なる周波数帯域に対してガイダンススケーリングを独立に適用するシンプルな手法です。FreScaは、再学習なしで既存の画像編集手法を向上させることが実証されています。さらに、その有効性は深度推定などの画像理解タスクにも拡張され、複数のデータセットにわたって定量的な向上をもたらします。

English

Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and classifier-free guidance enabling adjustable scaling. This scaling mechanism implicitly defines a ``scaling space'' whose potential for fine-grained semantic manipulation remains underexplored. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain. FreSca demonstrably enhances existing image editing methods without retraining. Excitingly, its effectiveness extends to image understanding tasks such as depth estimation, yielding quantitative gains across multiple datasets.

FreSca：拡散モデルにおけるスケーリング空間の解明

FreSca: Unveiling the Scaling Space in Diffusion Models

要旨

Support