ChatPaper.aiChatPaper

透过维度性重探扩散模型的预测机制

Revisiting Diffusion Model Predictions Through Dimensionality

January 29, 2026
作者: Qing Jin, Chaoyang Wang
cs.AI

摘要

扩散模型与流匹配模型的最新进展揭示了预测目标选择的转变趋势——在高维设定下,预测目标逐渐从噪声(ε)和速度(v)转向直接数据(x)预测。然而,关于最优目标为何取决于数据特定属性的理论解释仍然缺失。本研究提出了基于广义预测公式的理论框架,该框架可容纳任意输出目标(其中ε预测、v预测和x预测均为特例)。我们推导出数据几何特性与最优预测目标之间的解析关系,严格论证了当环境维度显著超过数据本征维度时,x预测为何更具优势。此外,虽然理论表明维度是决定最优预测目标的主导因素,但流形约束数据的本征维度在实际中往往难以估计。为弥补这一差距,我们提出k-Diff框架,该框架采用数据驱动方法直接从数据中学习最优预测参数k,无需显式维度估计。在潜空间和像素空间图像生成的大量实验表明,k-Diff在不同架构和数据规模下均能稳定超越固定目标基线,为提升生成性能提供了原则性自动化解决方案。
English
Recent advances in diffusion and flow matching models have highlighted a shift in the preferred prediction target -- moving from noise (varepsilon) and velocity (v) to direct data (x) prediction -- particularly in high-dimensional settings. However, a formal explanation of why the optimal target depends on the specific properties of the data remains elusive. In this work, we provide a theoretical framework based on a generalized prediction formulation that accommodates arbitrary output targets, of which varepsilon-, v-, and x-prediction are special cases. We derive the analytical relationship between data's geometry and the optimal prediction target, offering a rigorous justification for why x-prediction becomes superior when the ambient dimension significantly exceeds the data's intrinsic dimension. Furthermore, while our theory identifies dimensionality as the governing factor for the optimal prediction target, the intrinsic dimension of manifold-bound data is typically intractable to estimate in practice. To bridge this gap, we propose k-Diff, a framework that employs a data-driven approach to learn the optimal prediction parameter k directly from data, bypassing the need for explicit dimension estimation. Extensive experiments in both latent-space and pixel-space image generation demonstrate that k-Diff consistently outperforms fixed-target baselines across varying architectures and data scales, providing a principled and automated approach to enhancing generative performance.
PDF42February 3, 2026