扩散对偶性
The Diffusion Duality
June 12, 2025
作者: Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov
cs.AI
摘要
均匀状态离散扩散模型因其固有的自我修正能力,有望实现快速文本生成。然而,它们通常被自回归模型和掩码扩散模型所超越。在本研究中,我们通过利用一个关键见解缩小了这一性能差距:均匀状态扩散过程自然源自于底层的高斯扩散。我们的方法Duo,将高斯扩散中的强大技术迁移过来,以改进训练和采样过程。首先,我们引入了一种由高斯过程指导的课程学习策略,通过降低方差使训练速度翻倍。采用课程学习策略训练的模型在7个基准测试中的3个上,零样本困惑度超越了自回归模型。其次,我们提出了离散一致性蒸馏,将一致性蒸馏从连续域适应到离散域。该算法通过将采样速度提升两个数量级,解锁了扩散语言模型中的少步生成能力。我们在项目页面上提供了代码和模型检查点:http://s-sahoo.github.io/duo。
English
Uniform-state discrete diffusion models hold the promise of fast text
generation due to their inherent ability to self-correct. However, they are
typically outperformed by autoregressive models and masked diffusion models. In
this work, we narrow this performance gap by leveraging a key insight:
Uniform-state diffusion processes naturally emerge from an underlying Gaussian
diffusion. Our method, Duo, transfers powerful techniques from Gaussian
diffusion to improve both training and sampling. First, we introduce a
curriculum learning strategy guided by the Gaussian process, doubling training
speed by reducing variance. Models trained with curriculum learning surpass
autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we
present Discrete Consistency Distillation, which adapts consistency
distillation from the continuous to the discrete setting. This algorithm
unlocks few-step generation in diffusion language models by accelerating
sampling by two orders of magnitude. We provide the code and model checkpoints
on the project page: http://s-sahoo.github.io/duo