持续增强的离散扩散模型在类别生成建模中的应用

摘要

标准离散扩散模型将所有未观测状态统一映射到一个吸收性的[MASK]标记，这种做法在处理过程中形成了一个“信息真空”，即在去噪步骤之间，原本可以从未掩码标记推断出的语义信息丢失了。我们提出了连续增强离散扩散（CADD）框架，该框架通过在连续潜在空间中配对的扩散过程来增强离散状态空间。由此产生的分级、逐渐被破坏的状态中，掩码标记由带有噪声但仍具信息量的潜在向量表示，而非坍缩为“信息真空”。在每一步逆向过程中，CADD能够利用连续潜在空间作为语义提示，指导离散去噪。这一设计简洁且与现有离散扩散训练兼容。在采样时，通过调整连续潜在向量估计器的强度与选择，可以在模式覆盖（生成多样化输出）与模式寻求（生成上下文精确输出）行为之间实现可控的权衡。实验表明，CADD在文本生成、图像合成和代码建模任务上，相较于基于掩码的扩散模型，均提升了生成质量，在定性和定量指标上均对强大的离散基线模型实现了持续改进。

English

Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token. This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps. We introduce Continuously Augmented Discrete Diffusion (CADD), a framework that augments the discrete state space with a paired diffusion in a continuous latent space. This yields graded, gradually corrupted states in which masked tokens are represented by noisy yet informative latent vectors rather than collapsed 'information voids'. At each reverse step, CADD may leverage the continuous latent as a semantic hint to guide discrete denoising. The design is clean and compatible with existing discrete diffusion training. At sampling time, the strength and choice of estimator for the continuous latent vector enables a controlled trade-off between mode-coverage (generating diverse outputs) and mode-seeking (generating contextually precise outputs) behaviors. Empirically, we demonstrate CADD improves generative quality over mask-based diffusion across text generation, image synthesis, and code modeling, with consistent gains on both qualitative and quantitative metrics against strong discrete baselines.

持续增强的离散扩散模型在类别生成建模中的应用

Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

摘要

Support