SpacTor-T5：使用跨度损坏和替换令牌检测对 T5 模型进行预训练

摘要

预训练大型语言模型被认为是极其资源密集且经常低效的，未充分利用训练文本序列中所包含的信息。在本文中，我们提出了SpacTor，一种新的训练过程，包括（1）结合了跨度损坏（SC）和标记替换检测（RTD）的混合目标，以及（2）一个两阶段课程，通过初始tau次迭代优化混合目标，然后过渡到标准的SC损失。我们通过实验证明，混合目标的有效性与两阶段预训练时间表相关，并对为何如此进行了广泛分析。在我们对编码器-解码器架构（T5）在各种自然语言处理任务上的实验中，SpacTor-T5在保持与标准SC预训练相同的下游性能的同时，实现了预训练迭代次数减少50%和总FLOPs减少40%。或者，在相同的计算预算下，我们发现SpacTor导致了明显改善的下游基准性能。

English

Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial tau iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance.

SpacTor-T5：使用跨度损坏和替换令牌检测对 T5 模型进行预训练

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

摘要

Support