通过使用主动遗忘进行预训练来提高语言可塑性。

摘要

预训练语言模型（PLMs）如今是自然语言处理的主要模型。尽管它们在下游任务中表现出色，但将PLMs应用于新语言可能会很困难，这是使其能够普遍可用的一个障碍。先前的研究表明，通过为新语言学习新的嵌入层可以解决这个问题，但这样做既数据又计算效率低下。我们建议在预训练过程中使用主动遗忘机制，作为创建能够快速适应新语言的PLMs的简单方法。具体而言，在预训练过程中每隔K次更新重置嵌入层，我们鼓励PLM在有限次更新内改善学习新嵌入的能力，类似于元学习效果。通过对RoBERTa进行实验，我们发现采用我们遗忘机制预训练的模型不仅在语言适应过程中表现出更快的收敛速度，而且在数据稀缺情况下表现优于标准模型，尤其是对于与英语相距较远的语言。

English

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

通过使用主动遗忘进行预训练来提高语言可塑性。

Improving Language Plasticity via Pretraining with Active Forgetting

摘要

Support