UT5：使用展开去噪的非自回归T5进行预训练。

摘要

最近基于Transformer的大型语言模型取得了在自然语言生成方面的巨大进展。然而，为了解码K个标记，自回归模型需要进行K个顺序前向传递，这可能成为大型语言模型的性能瓶颈。许多非自回归（NAR）研究旨在解决这种顺序性瓶颈，尽管许多研究已经专注于在监督基准测试中的专用架构。在这项工作中，我们研究了通过展开去噪的无监督预训练，针对非自回归T5模型，并展示了其在下游生成任务（如SQuAD问题生成和XSum）中的最先进结果。

English

Recent advances in Transformer-based Large Language Models have made great strides in natural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may be a performance bottleneck for large language models. Many non-autoregressive (NAR) research are aiming to address this sequentiality bottleneck, albeit many have focused on a dedicated architecture in supervised benchmarks. In this work, we studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising and shown its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.

UT5：使用展开去噪的非自回归T5进行预训练。

UT5: Pretraining Non autoregressive T5 with unrolled denoising

摘要

Support