通過使用主動遺忘進行預訓練來提高語言可塑性。

摘要

預訓練語言模型（PLMs）如今是自然語言處理的主要模型。儘管其在下游任務中表現出色，但將PLMs應用於新語言可能會面臨困難，這是使其能力普遍可及的障礙。先前的研究已經顯示，通過為新語言學習新的嵌入層可以解決這個問題，但這樣做既耗費數據又計算效率低下。我們提出在預訓練過程中使用主動遺忘機制，作為創建能夠快速適應新語言的PLMs的簡單方法。具體來說，在預訓練過程中每K次更新重置嵌入層，我們鼓勵PLM在有限次更新內改進其學習新嵌入的能力，類似於元學習效應。RoBERTa的實驗表明，使用我們的遺忘機制預訓練的模型不僅在語言適應過程中顯示出更快的收斂速度，而且在低數據情況下表現優越，特別是對於與英語相距較遠的語言。

English

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

通過使用主動遺忘進行預訓練來提高語言可塑性。

Improving Language Plasticity via Pretraining with Active Forgetting

摘要

Support