所有尺度上的一切：尺度不变扩散与连续超分辨率

摘要

从噪声生成图像是图像生成；从粗糙输入重建精细细节则是超分辨率。尽管两者在实践上有所差异，但本质上都可理解为跨尺度的信息丢失恢复。我们提出SKILD（尺度不变K空间图像学习扩散模型），该模型在单一无条件框架内统一了图像生成与连续超分辨率任务。自然图像及关键物理系统均呈现尺度不变性，我们利用这一特性设计前向过程：在注入频谱匹配的高斯噪声的同时，将图像内容从精细到粗糙尺度逐步衰减，使尺度成为扩散动力学的显式坐标。相同训练后的反向过程仅通过改变起始时间步即可执行生成与连续超分辨率任务——无需任务专属架构、无条件分支、无分类器引导、无需针对不同缩放因子重新训练。实验结果表明，SKILD在无条件CIFAR-10上达到FID 2.65和Inception Score 9.63；基于单一无条件检查点即可在ImageNet上实现2倍至8倍超分辨率，且在感知指标上超越条件模型；同时能够重建临界伊辛模型，其连通四点关联函数与真实值高度吻合。

English

Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce SKILD, a Scale-invariant K-Space Image Learning Diffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor. Empirically, SKILD reaches FID 2.65 and Inception Score 9.63 on unconditional CIFAR-10, performs 2times--8times super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.