尺度空间扩散

摘要

扩散模型通过噪声使图像退化，而逆转此过程可揭示跨时间步的信息层级结构。尺度空间理论则通过低通滤波展现出类似的层级特性。我们正式建立了这种联系，并证明高度噪声化的扩散状态所包含的信息量不超过经下采样的小尺寸图像——这引发了一个问题：为何必须对全分辨率图像进行处理？为解决此问题，我们将尺度空间融合进扩散过程，构建了具有广义线性退化特性及实用实现的一系列扩散模型。采用下采样作为退化方法，我们提出了尺度空间扩散模型。为支持该模型，我们设计了Flexi-UNet——一种UNet变体，仅使用网络必要部分即可实现分辨率保持与分辨率提升的去噪操作。我们在CelebA和ImageNet数据集上评估了该框架，并分析了其在不同分辨率与网络深度下的缩放特性。项目网站(https://prateksha.github.io/projects/scale-space-diffusion/)已公开可用。

English

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.