均匀扩散模型再探：留一法去噪器与吸收态重新表述

摘要

离散扩散模型通常通过清洁数据预测进行训练，但该预测可通过不同方式定义反向动力学。在掩码扩散模型（MDM）中，这些选择大致重合，而在均匀扩散模型（UDM）中则不然。我们证明，UDM的标准桥接参数化并非由去噪后验优化，而是由一种留一法后验优化——该方法预测每个清洁词元时不依赖其自身的噪声观测。这揭示了插件式ELBO与常规交叉熵去噪目标之间的不匹配。我们刻画了留一法目标，并推导了去噪器、留一法后验与得分之间的精确转换关系。这些转换使我们能够解耦参数化与训练目标。进一步地，我们的结果还带来了无需额外训练的推理改进：基于留一法预测器的预测-校正采样器，以及改进的温度采样方法。我们进一步引入一种吸收态形式的均匀扩散，该形式在保持UDM联合分布的同时，将其分解为类掩码扩散的采样操作，具有更简单的去噪后验、延续性去掩码机制以及自然的再掩码机制。在语言建模任务上，留一法参数化一致地提升了UDM的生成质量，而吸收态构造则与掩码扩散性能相当甚至更优。这些结果表明，掩码扩散与均匀扩散之间的经验差距主要源于参数化与采样设计，而非边缘分布本身的选择。代码与模型可在 https://github.com/samsongourevitch/rev_udm 获取。

English

Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform Diffusion Models (UDM) they do not. We show that the standard plug-in bridge parameterization for UDM is not optimized by the denoising posterior, but by a leave-one-out posterior that predicts each clean token without using its own noisy observation. This identifies a mismatch between the plug-in ELBO and the usual cross-entropy denoising objective. We characterize the leave-one-out target and derive exact conversions between the denoiser, the leave-one-out posterior, and the score. These conversions allow us to disentangle parameterization and training objective. Our results also lead to inference improvements without any additional training through an informed predictor-corrector sampler and improved temperature sampling based on the leave-one-out predictor. We further introduce an absorbing-state reformulation of uniform diffusion that preserves the UDM joint law while decomposing it into masked-diffusion-like sampling operations, with simpler denoising posteriors, carry-over unmasking, and a natural remasking mechanism. On language modeling, leave-one-out parameterizations consistently improve UDM generation, while the absorbing construction matches or surpasses masked diffusion. These results suggest that the empirical gap between masked and uniform diffusion is driven less by the choice of marginals themselves than by parameterization and sampling design. The code and models can be found at https://github.com/samsongourevitch/rev_udm.