文本擴散模型的遷移學習

摘要

在這份報告中，我們探討了文字擴散取代自回歸（AR）解碼在訓練和部署大型語言模型（LLMs）時的潛力。我們特別感興趣的是，預訓練的AR模型是否可以通過我們稱為“AR2Diff”的輕量級適應程序轉換為文字擴散模型。我們首先建立了一個強大的基準訓練文字擴散模型的設置。通過比較多種架構和預訓練目標，我們發現僅使用解碼器模型並具有前綴LM目標的訓練方式在多個任務中是最佳或接近最佳的。基於這一發現，我們測試了各種用於文字擴散模型的遷移學習設置。在機器翻譯方面，我們發現文字擴散效果不如標準的AR方法。然而，在代碼合成和抽取式問答方面，我們發現從頭開始訓練的擴散模型在許多情況下優於AR模型。我們還觀察到從AR轉換為使用擴散解碼的AR2Diff中獲得了質量提升。這些結果是令人鼓舞的，因為文字擴散相對未被充分探索，並且在長文本生成方面可以顯著比AR解碼更快。

English

In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.

文本擴散模型的遷移學習

Transfer Learning for Text Diffusion Models

摘要

Support