所有尺度的一切：尺度不變擴散與連續超解析度

摘要

從雜訊中創建影像即為影像生成；從粗略輸入中重建精細細節則為超解析度。儘管兩者在實務上有所差異，但都可理解為在不同尺度上逆轉資訊遺失的過程。我們提出SKILD，一種尺度不變的K空間影像學習擴散模型，能在單一無條件框架內統一影像生成與連續超解析度。自然影像與關鍵物理系統皆展現尺度不變性，我們利用此特性設計前向過程，從精細到粗略尺度逐步衰減影像內容，同時注入頻譜匹配的高斯雜訊，使尺度成為擴散動力學中的明確座標。同一訓練好的反向過程僅透過改變起始時間步，即可執行生成與連續超解析度：無需任務專用架構、無需條件分支、無需無分類器引導、亦無需針對每個尺度因子重新訓練。實驗上，SKILD在無條件CIFAR-10上達到FID 2.65與Inception Score 9.63；從單一無條件檢查點對ImageNet執行2倍至8倍超解析度，並在感知指標上優於條件式模型；同時能重建關鍵伊辛模型，其四點連通相關性緊密貼近真實值。

English

Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce SKILD, a Scale-invariant K-Space Image Learning Diffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor. Empirically, SKILD reaches FID 2.65 and Inception Score 9.63 on unconditional CIFAR-10, performs 2times--8times super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.