用於類別生成建模的連續增強離散擴散模型

摘要

標準的離散擴散模型將所有未觀測狀態統一映射到一個吸收的[MASK]標記，這種處理方式創造了一個「信息真空」，在去噪步驟之間，原本可從未遮罩標記推斷出的語義信息因此丟失。我們提出了連續增強離散擴散（CADD）框架，該框架通過在連續潛空間中配對擴散來增強離散狀態空間。這樣一來，遮罩標記不再坍縮為「信息真空」，而是由帶有噪聲但仍具信息量的潛在向量表示，形成分級、逐漸被破壞的狀態。在每個反向步驟中，CADD可以利用連續潛在向量作為語義提示來引導離散去噪。這一設計簡潔且與現有的離散擴散訓練兼容。在採樣時，通過調整連續潛在向量的估計器強度與選擇，可以在模式覆蓋（生成多樣化輸出）與模式尋求（生成上下文精確輸出）行為之間實現可控的權衡。實驗表明，CADD在文本生成、圖像合成和代碼建模任務上相較於基於遮罩的擴散模型提升了生成質量，無論是在定性還是定量指標上，相較於強大的離散基線模型均取得了穩定的提升。

English

Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token. This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps. We introduce Continuously Augmented Discrete Diffusion (CADD), a framework that augments the discrete state space with a paired diffusion in a continuous latent space. This yields graded, gradually corrupted states in which masked tokens are represented by noisy yet informative latent vectors rather than collapsed 'information voids'. At each reverse step, CADD may leverage the continuous latent as a semantic hint to guide discrete denoising. The design is clean and compatible with existing discrete diffusion training. At sampling time, the strength and choice of estimator for the continuous latent vector enables a controlled trade-off between mode-coverage (generating diverse outputs) and mode-seeking (generating contextually precise outputs) behaviors. Empirically, we demonstrate CADD improves generative quality over mask-based diffusion across text generation, image synthesis, and code modeling, with consistent gains on both qualitative and quantitative metrics against strong discrete baselines.

用於類別生成建模的連續增強離散擴散模型

Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

摘要

Support