テキスト拡散モデルのための転移学習

要旨

本報告書では、大規模言語モデル（LLM）の学習と展開において、テキスト拡散モデルが自己回帰（AR）デコードを置き換える可能性について探求します。特に、事前学習済みのARモデルを、我々が「AR2Diff」と呼ぶ軽量な適応手順を通じてテキスト拡散モデルに変換できるかどうかに注目します。まず、テキスト拡散モデルの学習のための強力なベースライン設定を確立します。複数のアーキテクチャと事前学習目的を比較し、プレフィックスLM目的でデコーダのみのモデルを学習することが、いくつかのタスクで最良またはそれに近い結果をもたらすことを発見しました。この知見を基に、テキスト拡散モデルのための様々な転移学習の設定をテストします。機械翻訳では、テキスト拡散モデルは標準的なARアプローチに劣る結果を示しました。しかし、コード合成と抽出型QAでは、スクラッチから学習した拡散モデルが多くの場合でARモデルを上回りました。また、ARモデルを拡散デコードを使用するように適応させるAR2Diffから品質の向上も観察されました。これらの結果は、テキスト拡散が比較的未開拓であり、長文生成においてARデコードよりも大幅に高速化できることを考えると、非常に有望です。

English

In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.

テキスト拡散モデルのための転移学習

Transfer Learning for Text Diffusion Models

要旨

Support