UT5:使用展开去噪的非自回归T5进行预训练。
UT5: Pretraining Non autoregressive T5 with unrolled denoising
November 14, 2023
作者: Mahmoud G. Salem, Jiayu Ye, Chu-Cheng Lin, Frederick Liu
cs.AI
摘要
最近基于Transformer的大型语言模型取得了在自然语言生成方面的巨大进展。然而,为了解码K个标记,自回归模型需要进行K个顺序前向传递,这可能成为大型语言模型的性能瓶颈。许多非自回归(NAR)研究旨在解决这种顺序性瓶颈,尽管许多研究已经专注于在监督基准测试中的专用架构。在这项工作中,我们研究了通过展开去噪的无监督预训练,针对非自回归T5模型,并展示了其在下游生成任务(如SQuAD问题生成和XSum)中的最先进结果。
English
Recent advances in Transformer-based Large Language Models have made great
strides in natural language generation. However, to decode K tokens, an
autoregressive model needs K sequential forward passes, which may be a
performance bottleneck for large language models. Many non-autoregressive (NAR)
research are aiming to address this sequentiality bottleneck, albeit many have
focused on a dedicated architecture in supervised benchmarks. In this work, we
studied unsupervised pretraining for non auto-regressive T5 models via unrolled
denoising and shown its SoTA results in downstream generation tasks such as
SQuAD question generation and XSum.