尺度空间扩散

摘要

扩散模型通过噪声使图像退化，而逆转这一过程可揭示跨时间步的信息层级结构。尺度空间理论则通过低通滤波展现出类似的层级特性。我们正式建立了这种关联，并证明高度噪声化的扩散状态所包含的信息量不超过小型下采样图像——这引发了一个疑问：为何必须对它们进行全分辨率处理？为解决此问题，我们通过构建具有广义线性退化特性及实用实现方案的扩散模型家族，将尺度空间融合到扩散过程中。采用下采样作为退化方法催生了我们提出的尺度空间扩散模型。为支持该模型，我们引入Flexi-UNet——一种仅使用网络必要部分即可实现分辨率保持与分辨率提升去噪的UNet变体。我们在CelebA和ImageNet数据集上评估该框架，并分析其跨分辨率与网络深度的缩放特性。项目网站(https://prateksha.github.io/projects/scale-space-diffusion/)已公开可用。

English

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.