ChatPaper.aiChatPaper

FreSca:揭示擴散模型中的縮放空間

FreSca: Unveiling the Scaling Space in Diffusion Models

April 2, 2025
作者: Chao Huang, Susan Liang, Yunlong Tang, Li Ma, Yapeng Tian, Chenliang Xu
cs.AI

摘要

擴散模型在圖像任務中展現了卓越的可控性,主要通過編碼任務特定信息的噪聲預測以及實現可調節縮放的無分類器指導來實現。這種縮放機制隱含地定義了一個「縮放空間」,其對細粒度語義操作的潛力尚未得到充分探索。我們從基於反轉的編輯開始研究這一空間,其中條件/無條件噪聲預測之間的差異承載著關鍵的語義信息。我們的核心貢獻源於對噪聲預測的傅里葉分析,揭示了其低頻和高頻分量在擴散過程中以不同方式演變。基於這一洞察,我們引入了FreSca,這是一種簡單的方法,它將指導縮放獨立應用於傅里葉域中的不同頻帶。FreSca顯著增強了現有的圖像編輯方法,而無需重新訓練。令人興奮的是,其有效性還延伸至深度估計等圖像理解任務,在多個數據集上實現了定量提升。
English
Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and classifier-free guidance enabling adjustable scaling. This scaling mechanism implicitly defines a ``scaling space'' whose potential for fine-grained semantic manipulation remains underexplored. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain. FreSca demonstrably enhances existing image editing methods without retraining. Excitingly, its effectiveness extends to image understanding tasks such as depth estimation, yielding quantitative gains across multiple datasets.

Summary

AI-Generated Summary

PDF192April 4, 2025