ChatPaper.aiChatPaper

透過主動遺忘預訓練提升語言可塑性

Improving Language Plasticity via Pretraining with Active Forgetting

July 3, 2023
作者: Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetor, Sebastian Riedel, Mikel Artetx
cs.AI

摘要

預訓練語言模型(PLMs)已成為自然語言處理的主流技術。儘管其在下游任務中表現卓越,但將PLMs應用於新語言時仍存在障礙,這限制了其能力的普及化。既有研究雖可透過為新語言學習新的嵌入層來解決此問題,但這種方法在數據和計算效率上均存在不足。我們提出在預訓練階段引入主動遺忘機制,作為創建能快速適應新語言的PLMs的簡便方法。具體而言,通過在預訓練期間每間隔K次更新重置嵌入層,我們促使PLM在有限更新次數內提升學習新嵌入表徵的能力,類似於元學習效果。基於RoBERTa的實驗表明,採用遺忘機制預訓練的模型不僅在語言適應階段展現出更快的收斂速度,在低數據量情境下(尤其是與英語差異較大的語言)其表現也優於標準預訓練模型。
English
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.
PDF60February 8, 2026