所有尺度的一切:尺度不變擴散與連續超解析度
Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution
May 25, 2026
作者: Zixin Jessie Chen, Zhuo Chen, Archer Wang, Jeff Gore, William T. Freeman, Congyue Deng, Marin Soljačić
cs.AI
摘要
從雜訊中創建影像即為影像生成;從粗略輸入中重建精細細節則為超解析度。儘管兩者在實務上有所差異,但都可理解為在不同尺度上逆轉資訊遺失的過程。我們提出SKILD,一種尺度不變的K空間影像學習擴散模型,能在單一無條件框架內統一影像生成與連續超解析度。自然影像與關鍵物理系統皆展現尺度不變性,我們利用此特性設計前向過程,從精細到粗略尺度逐步衰減影像內容,同時注入頻譜匹配的高斯雜訊,使尺度成為擴散動力學中的明確座標。同一訓練好的反向過程僅透過改變起始時間步,即可執行生成與連續超解析度:無需任務專用架構、無需條件分支、無需無分類器引導、亦無需針對每個尺度因子重新訓練。實驗上,SKILD在無條件CIFAR-10上達到FID 2.65與Inception Score 9.63;從單一無條件檢查點對ImageNet執行2倍至8倍超解析度,並在感知指標上優於條件式模型;同時能重建關鍵伊辛模型,其四點連通相關性緊密貼近真實值。
English
Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce SKILD, a Scale-invariant K-Space Image Learning Diffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor. Empirically, SKILD reaches FID 2.65 and Inception Score 9.63 on unconditional CIFAR-10, performs 2times--8times super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.