通过维度性重审视扩散模型的预测
Revisiting Diffusion Model Predictions Through Dimensionality
January 29, 2026
作者: Qing Jin, Chaoyang Wang
cs.AI
摘要
扩散模型与流匹配模型的最新进展揭示了预测目标的偏好转变——从噪声(ε)和速度(v)预测转向直接数据(x)预测,这一趋势在高维场景中尤为明显。然而,关于最优目标为何取决于数据特定属性的形式化解释仍属空白。本研究提出了一个广义预测框架的理论体系,该体系可容纳任意输出目标(其中ε预测、v预测和x预测均为特例)。我们推导出数据几何特性与最优预测目标之间的解析关系,从理论上严格证明了当环境维度显著超越数据本征维度时,x预测会表现出优越性。值得注意的是,虽然理论将维度确定为最优预测目标的决定因素,但流形约束数据的本征维度在实际中往往难以估计。为弥合这一差距,我们提出k-Diff框架,该框架采用数据驱动方法直接从数据中学习最优预测参数k,无需显式维度估计。在潜空间和像素空间图像生成的大量实验表明,k-Diff在不同架构和数据规模下均能稳定超越固定目标基线方法,为提升生成性能提供了原则性自动化解决方案。
English
Recent advances in diffusion and flow matching models have highlighted a shift in the preferred prediction target -- moving from noise (varepsilon) and velocity (v) to direct data (x) prediction -- particularly in high-dimensional settings. However, a formal explanation of why the optimal target depends on the specific properties of the data remains elusive. In this work, we provide a theoretical framework based on a generalized prediction formulation that accommodates arbitrary output targets, of which varepsilon-, v-, and x-prediction are special cases. We derive the analytical relationship between data's geometry and the optimal prediction target, offering a rigorous justification for why x-prediction becomes superior when the ambient dimension significantly exceeds the data's intrinsic dimension. Furthermore, while our theory identifies dimensionality as the governing factor for the optimal prediction target, the intrinsic dimension of manifold-bound data is typically intractable to estimate in practice. To bridge this gap, we propose k-Diff, a framework that employs a data-driven approach to learn the optimal prediction parameter k directly from data, bypassing the need for explicit dimension estimation. Extensive experiments in both latent-space and pixel-space image generation demonstrate that k-Diff consistently outperforms fixed-target baselines across varying architectures and data scales, providing a principled and automated approach to enhancing generative performance.