ChatPaper.aiChatPaper

图像扩散模型中的局部性源于数据统计特性

Locality in Image Diffusion Models Emerges from Data Statistics

September 11, 2025
作者: Artem Lukoianov, Chenyang Yuan, Justin Solomon, Vincent Sitzmann
cs.AI

摘要

在生成模型中,扩散模型因其训练目标存在闭式最优最小化器(常被称为最优去噪器)而显得尤为引人注目。然而,使用这一最优去噪器进行扩散仅能复现训练集中的图像,因而未能捕捉到深度扩散模型的行为特征。近期研究尝试刻画最优去噪器与深度扩散模型之间的这一差距,提出了一些无需训练的分析模型,这些模型能够生成与训练过的UNet所生成图像相似的图像。其中表现最佳的方法假设卷积神经网络的平移等变性和局部性归纳偏置是性能差距的根源,因此将这些假设纳入其分析模型中。在本研究中,我们提供的证据表明,深度扩散模型中的局部性源于图像数据集的统计特性,而非卷积神经网络的归纳偏置。具体而言,我们证明了一个最优参数化线性去噪器展现出与深度神经去噪器相似的局部性特征。我们进一步从理论和实验上表明,这种局部性直接源自自然图像数据集中存在的像素相关性。最后,基于这些洞见,我们设计了一个分析去噪器,其预测的得分比之前专家设计的替代方案更接近深度扩散模型的预测结果。
English
Among generative models, diffusion models are uniquely intriguing due to the existence of a closed-form optimal minimizer of their training objective, often referred to as the optimal denoiser. However, diffusion using this optimal denoiser merely reproduces images in the training set and hence fails to capture the behavior of deep diffusion models. Recent work has attempted to characterize this gap between the optimal denoiser and deep diffusion models, proposing analytical, training-free models that can generate images that resemble those generated by a trained UNet. The best-performing method hypothesizes that shift equivariance and locality inductive biases of convolutional neural networks are the cause of the performance gap, hence incorporating these assumptions into its analytical model. In this work, we present evidence that the locality in deep diffusion models emerges as a statistical property of the image dataset, not due to the inductive bias of convolutional neural networks. Specifically, we demonstrate that an optimal parametric linear denoiser exhibits similar locality properties to the deep neural denoisers. We further show, both theoretically and experimentally, that this locality arises directly from the pixel correlations present in natural image datasets. Finally, we use these insights to craft an analytical denoiser that better matches scores predicted by a deep diffusion model than the prior expert-crafted alternative.
PDF122September 16, 2025