AR-Diffusion：自回歸擴散模型用於文本生成

摘要

擴散模型因其卓越的表現而在圖像生成領域引起了相當大的關注。最近，它們的成功已擴展到文本生成，通過同時生成序列中的所有標記。然而，自然語言相較於圖像展現出更為明顯的順序依賴性，且現有的大多數語言模型是使用從左到右的自回歸方法進行訓練。為了應對自然語言固有的順序特性，我們引入了自回歸擴散（AR-Diffusion）。AR-Diffusion確保右側標記的生成取決於左側生成的標記，這是通過使用根據標記位置變化的動態去噪步驟數來實現的機制。這導致左側的標記經歷的去噪步驟比右側的少，從而使它們能夠更早生成，並隨後影響右側標記的生成。在包括文本摘要、機器翻譯和常識生成在內的各種文本生成任務的一系列實驗中，AR-Diffusion明顯展示了優於現有擴散語言模型的優越性，並且在實現可比較的結果時可以快100倍至600倍。我們的代碼將會公開發布。

English

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained utilizing a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated the superiority over existing diffusion language models and that it can be 100timessim600times faster when achieving comparable results. Our code will be publicly released.

AR-Diffusion：自回歸擴散模型用於文本生成

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

摘要

Support