스케일 공간 확산

초록

확산 모델은 노이즈를 통해 이미지를 저하시키며, 이 과정을 역으로 진행하면 타임스텝에 걸친 정보 계층 구조가 드러납니다. 스케일 공간 이론은 저대역 통과 필터링을 통해 유사한 계층 구조를 보여줍니다. 본 연구에서는 이러한 연관성을 공식화하고, 고도로 노이즈가 첨가된 확산 상태가 작은 다운샘플링 이미지보다 더 많은 정보를 포함하지 않음을 입증하며, 왜 이러한 상태를 전체 해상도로 처리해야 하는지에 대한 의문을 제기합니다. 이를 해결하기 위해 일반화된 선형 저하와 실용적인 구현을 통해 확산 과정에 스케일 공간을 융합한 새로운 확산 모델 패밀리를 제안합니다. 다운샘플링을 저하 방식으로 사용하는 것이 우리가 제안하는 스케일 공간 확산입니다. 스케일 공간 확산을 지원하기 위해, 네트워크의 필요한 부분만 사용하여 해상도 유지 및 해상도 증가 디노이징을 수행하는 UNet 변형인 Flexi-UNet을 소개합니다. 우리는 CelebA와 ImageNet 데이터셋을 통해 본 프레임워크를 평가하고, 해상도와 네트워크 깊이에 따른 확장 성능을 분석합니다. 프로젝트 웹사이트(https://prateksha.github.io/projects/scale-space-diffusion/)는 공개되어 있습니다.

English

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.