边推理边生成：带有计划去噪的离散扩散

摘要

离散扩散已经实现了最先进的性能，在标准基准测试中胜过或接近自回归模型。在这项工作中，我们介绍了计划去噪的离散扩散（DDPD），这是一个将生成过程分为两个模型的新框架：一个规划者和一个去噪器。在推断时，规划者通过识别最受损害的位置，包括最初受损害的位置和需要额外细化的位置，选择下一个需要去噪的位置。这种计划和去噪的方法通过迭代地按最佳顺序识别和去噪损坏，实现了更高效的重建生成。DDPD胜过传统的仅去噪器的蒙版扩散方法，在诸如text8、OpenWebText以及基于ImageNet 256×256的基于标记的生成等语言建模基准测试中取得了卓越的结果。值得注意的是，在语言建模中，DDPD显著缩小了扩散和自回归方法在生成困惑度方面的性能差距。代码可在https://github.com/liusulin/DDPD找到。

English

Discrete diffusion has achieved state-of-the-art performance, outperforming or approaching autoregressive models on standard benchmarks. In this work, we introduce Discrete Diffusion with Planned Denoising (DDPD), a novel framework that separates the generation process into two models: a planner and a denoiser. At inference time, the planner selects which positions to denoise next by identifying the most corrupted positions in need of denoising, including both initially corrupted and those requiring additional refinement. This plan-and-denoise approach enables more efficient reconstruction during generation by iteratively identifying and denoising corruptions in the optimal order. DDPD outperforms traditional denoiser-only mask diffusion methods, achieving superior results on language modeling benchmarks such as text8, OpenWebText, and token-based generation on ImageNet 256 times 256. Notably, in language modeling, DDPD significantly reduces the performance gap between diffusion-based and autoregressive methods in terms of generative perplexity. Code is available at https://github.com/liusulin/DDPD.

边推理边生成：带有计划去噪的离散扩散

Think While You Generate: Discrete Diffusion with Planned Denoising

摘要

Support