AR-Diffusion：用于文本生成的自回归扩散模型

摘要

扩散模型由于其出色的性能在图像生成领域引起了广泛关注。最近，它们的成功已经扩展到文本生成领域，通过同时生成序列中的所有标记。然而，自然语言相比图像表现出更明显的顺序依赖性，大多数现有的语言模型是使用从左到右的自回归方法进行训练的。为了考虑自然语言固有的顺序特性，我们引入了自回归扩散（AR-Diffusion）。AR-Diffusion 确保右侧标记的生成取决于左侧生成的标记，这一机制通过采用动态数量的去噪步骤来实现，这些步骤根据标记位置的不同而变化。这导致左侧的标记经历的去噪步骤较右侧的标记较少，从而使它们能够更早生成，并随后影响右侧标记的生成。在包括文本摘要、机器翻译和常识生成在内的各种文本生成任务的一系列实验中，AR-Diffusion 明显展示了优于现有扩散语言模型的优势，并且在实现可比较结果时可以快100倍至600倍。我们的代码将会公开发布。

English

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained utilizing a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated the superiority over existing diffusion language models and that it can be 100timessim600times faster when achieving comparable results. Our code will be publicly released.

AR-Diffusion：用于文本生成的自回归扩散模型

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

摘要

Support