ChatPaper.aiChatPaper

UT5:使用展开去噪的非自回归T5进行预训练。

UT5: Pretraining Non autoregressive T5 with unrolled denoising

November 14, 2023
作者: Mahmoud G. Salem, Jiayu Ye, Chu-Cheng Lin, Frederick Liu
cs.AI

摘要

最近基于Transformer的大型语言模型取得了在自然语言生成方面的巨大进展。然而,为了解码K个标记,自回归模型需要进行K个顺序前向传递,这可能成为大型语言模型的性能瓶颈。许多非自回归(NAR)研究旨在解决这种顺序性瓶颈,尽管许多研究已经专注于在监督基准测试中的专用架构。在这项工作中,我们研究了通过展开去噪的无监督预训练,针对非自回归T5模型,并展示了其在下游生成任务(如SQuAD问题生成和XSum)中的最先进结果。
English
Recent advances in Transformer-based Large Language Models have made great strides in natural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may be a performance bottleneck for large language models. Many non-autoregressive (NAR) research are aiming to address this sequentiality bottleneck, albeit many have focused on a dedicated architecture in supervised benchmarks. In this work, we studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising and shown its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
PDF80December 15, 2024