AR-Diffusion: テキスト生成のための自己回帰型拡散モデル

要旨

拡散モデルは、その優れた性能により画像生成の分野で大きな注目を集めています。最近では、この成功がテキスト生成にも拡張され、シーケンス内の全てのトークンを同時に生成する手法が提案されています。しかし、自然言語は画像と比べてはるかに顕著な順序依存性を示し、既存の言語モデルの多くは左から右への自己回帰的なアプローチを用いて学習されています。自然言語の本質的な順序特性を考慮するため、我々は自己回帰型拡散モデル（AR-Diffusion）を提案します。AR-Diffusionでは、右側のトークンの生成が左側の生成済みトークンに依存することを保証します。これは、トークンの位置に基づいて動的に変化するノイズ除去ステップ数を採用することで実現されます。その結果、左側のトークンは右側のトークンよりも少ないノイズ除去ステップを経て、早期に生成され、右側のトークンの生成に影響を与えることが可能になります。テキスト要約、機械翻訳、常識生成など様々なテキスト生成タスクにおける一連の実験では、AR-Diffusionが既存の拡散言語モデルを明らかに凌駕し、同等の結果を達成する際に100倍から600倍高速であることが示されました。我々のコードは公開される予定です。

English

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained utilizing a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated the superiority over existing diffusion language models and that it can be 100timessim600times faster when achieving comparable results. Our code will be publicly released.

AR-Diffusion: テキスト生成のための自己回帰型拡散モデル

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

要旨

Support