スケール空間拡散

要旨

拡散モデルはノイズを通じて画像を劣化させ、この過程を逆転させることで時間ステップ間の情報階層が明らかになる。スケール空間理論も、ローパスフィルタリングを通じて同様の階層性を示す。本研究ではこの関連性を形式化し、高ノイズの拡散状態が、小さなダウンサンプリング画像以上の情報を含まないことを示す。これは、なぜそれらがフル解像度で処理されなければならないのかという疑問を提起する。この問題に対処するため、一般化された線形劣化と実用的な実装を備えた拡散モデルのファミリーを定式化し、スケール空間を拡散過程に融合させる。ダウンサンプリングを劣化として用いることで、我々が提案するScale Space Diffusionが得られる。Scale Space Diffusionを支援するため、ネットワークの必要な部分のみを使用して解像度維持および解像度向上のノイズ除去を行うUNet変種であるFlexi-UNetを導入する。CelebAとImageNetにおいて本フレームワークを評価し、解像度とネットワーク深度にわたるスケーリング挙動を分析する。プロジェクトウェブサイト（https://prateksha.github.io/projects/scale-space-diffusion/）は公開されている。

English

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.