FreSca: 확산 모델의 스케일링 공간 탐구

초록

디퓨전 모델은 주로 작업별 정보를 인코딩하는 노이즈 예측과 조정 가능한 스케일링을 가능하게 하는 분류기 없는 가이던스를 통해 이미지 작업에서 인상적인 제어 능력을 제공합니다. 이 스케일링 메커니즘은 미세한 의미 조작을 위한 잠재력을 아직 충분히 탐구되지 않은 "스케일링 공간"을 암묵적으로 정의합니다. 우리는 이 공간을 탐구하며, 조건부/무조건부 노이즈 예측 간의 차이가 핵심 의미 정보를 담고 있는 역전 기반 편집에서 시작합니다. 우리의 핵심 기여는 노이즈 예측에 대한 푸리에 분석에서 비롯되었으며, 이 분석을 통해 저주파 및 고주파 성분이 디퓨전 과정에서 다르게 진화한다는 사실을 발견했습니다. 이러한 통찰을 바탕으로, 우리는 푸리에 도메인에서 서로 다른 주파수 대역에 독립적으로 가이던스 스케일링을 적용하는 간단한 방법인 FreSca를 소개합니다. FreSca는 재학습 없이도 기존 이미지 편집 방법을 향상시키는 것으로 입증되었습니다. 더욱 흥미롭게도, 이 방법의 효과는 깊이 추정과 같은 이미지 이해 작업으로까지 확장되어 여러 데이터셋에서 양적 성능 향상을 이끌어냅니다.

English

Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and classifier-free guidance enabling adjustable scaling. This scaling mechanism implicitly defines a ``scaling space'' whose potential for fine-grained semantic manipulation remains underexplored. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain. FreSca demonstrably enhances existing image editing methods without retraining. Excitingly, its effectiveness extends to image understanding tasks such as depth estimation, yielding quantitative gains across multiple datasets.

FreSca: 확산 모델의 스케일링 공간 탐구

FreSca: Unveiling the Scaling Space in Diffusion Models

초록

Support