自己回帰モデルからの適応を通じた拡張拡散言語モデル

要旨

拡散言語モデル（DLMs）は、テキスト生成モデリングにおいて有望な新しいパラダイムとして登場し、自己回帰（AR）モデルの制約に対処する可能性があります。ただし、現在のDLMsは、ARモデルと比較して規模が小さく、言語モデリングのベンチマークで公平な比較が欠けています。さらに、スケールでゼロから拡散モデルをトレーニングすることは依然として困難です。オープンソースのAR言語モデルが広く普及していることから、これらのモデルを適応してテキスト拡散モデルを構築することを提案します。ARと拡散モデリングの目標のつながりを示し、拡散モデルをトレーニングするための簡単な継続的事前トレーニングアプローチを紹介します。言語モデリング、推論、常識のベンチマークでの体系的評価を通じて、127Mから7Bのパラメータ（GPT2およびLLaMA）を持つARモデルをDiffuGPTおよびDiffuLLaMAという拡散モデルに変換し、トレーニングに200B未満のトークンを使用することができることを示します。実験結果は、これらのモデルが以前のDLMsを凌駕し、ARモデルと競合していることを示しています。私たちは、流暢なテキストを生成し、コンテキスト内で学習を行い、プロンプトの再順序付けなしで中身を埋め、指示に従うことができる一連のDLMs（127M、355M、7Bのパラメータを持つ）をリリースします。

English

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. Through systematic evaluation on language modeling, reasoning, and commonsense benchmarks, we show that we can convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts. We release a suite of DLMs (with 127M, 355M, and 7B parameters) capable of generating fluent text, performing in-context learning, filling in the middle without prompt re-ordering, and following instructions https://github.com/HKUNLP/DiffuLLaMA.

自己回帰モデルからの適応を通じた拡張拡散言語モデル

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

要旨

Support