文本扩散模型的迁移学习
Transfer Learning for Text Diffusion Models
January 30, 2024
作者: Kehang Han, Kathleen Kenealy, Aditya Barua, Noah Fiedel, Noah Constant
cs.AI
摘要
在本报告中,我们探讨了文本扩散替代自回归(AR)解码用于大型语言模型(LLMs)的训练和部署的潜力。我们特别感兴趣的是,预训练的AR模型是否可以通过我们称之为“AR2Diff”的轻量级适应过程转变为文本扩散模型。我们首先建立了一个强大的基准设置,用于训练文本扩散模型。通过比较多种架构和预训练目标,我们发现,仅使用解码器模型并采用前缀语言模型目标在多个任务中是最佳或接近最佳的。基于这一发现,我们测试了各种文本扩散模型的迁移学习设置。在机器翻译中,我们发现文本扩散模型表现不及标准的AR方法。然而,在代码合成和抽取式问答方面,我们发现从头开始训练的扩散模型在许多情况下优于AR模型。我们还观察到从AR转换为使用扩散解码的AR2Diff可以提高质量。这些结果是令人鼓舞的,因为文本扩散相对未被充分探索,并且在长文本生成方面可以比AR解码快得多。
English
In this report, we explore the potential for text diffusion to replace
autoregressive (AR) decoding for the training and deployment of large language
models (LLMs). We are particularly interested to see whether pretrained AR
models can be transformed into text diffusion models through a lightweight
adaptation procedure we call ``AR2Diff''. We begin by establishing a strong
baseline setup for training text diffusion models. Comparing across multiple
architectures and pretraining objectives, we find that training a decoder-only
model with a prefix LM objective is best or near-best across several tasks.
Building on this finding, we test various transfer learning setups for text
diffusion models. On machine translation, we find that text diffusion
underperforms the standard AR approach. However, on code synthesis and
extractive QA, we find diffusion models trained from scratch outperform AR
models in many cases. We also observe quality gains from AR2Diff -- adapting AR
models to use diffusion decoding. These results are promising given that text
diffusion is relatively underexplored and can be significantly faster than AR
decoding for long text generation.