扩散对偶论，第二章：Ψ采样器与高效课程学习

摘要

均匀状态离散扩散模型因其具备自我校正能力，在少步生成与引导任务中表现卓越，使其在这些场景下优于自回归或掩码扩散模型。然而当采用祖先采样器时，其采样质量会随步数增加而进入平台期。我们提出了一类适用于离散扩散的预测器-校正器（PC）采样器，该系列不仅泛化了现有方法，还可应用于任意噪声过程。当与均匀状态扩散结合时，我们的采样器在语言和图像建模任务上均超越祖先采样：在OpenWebText数据集上实现了相同单字熵条件下更低的生成困惑度，在CIFAR10数据集上获得了更优的FID/IS分数。关键的是，与传统采样器不同，我们的PC方法能随采样步数增加持续提升性能。这些发现共同对"掩码扩散是扩散式语言建模必然发展方向"的假设提出了质疑。在采样技术之外，我们还为高斯松弛训练阶段开发了内存高效的课程学习方案，与Duo相比训练时间减少25%、内存占用降低33%，同时在OpenWebText和LM1B数据集上保持相当困惑度，并具备强劲的下游任务性能。我们已通过https://s-sahoo.com/duo-ch2发布代码、检查点及视频教程。

English

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2

扩散对偶论，第二章：Ψ采样器与高效课程学习

The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum

摘要

Support