在語言模型中解鎖持續學習能力

摘要

語言模型（LMs）展現出令人印象深刻的性能和泛化能力。然而，LMs在持續學習（CL）中面臨著災難性遺忘的持續挑戰，這削弱了它們在長期可持續性方面的表現。現有方法通常通過將舊任務數據或任務相關的歸納偏差納入LMs來解決此問題。然而，舊數據和準確的任務信息通常難以獲取或成本高昂，這阻礙了當前CL方法對LMs的可用性。為了解決這一限制，我們引入了MIGU（基於Magnitude的梯度更新用於持續學習），這是一種無需排練且無需任務標籤的方法，僅通過更新LMs線性層中輸出的大幅度模型參數。MIGU基於我們的觀察，即LMs線性層中輸出的L1正規化幅度分佈在處理不同任務數據時是不同的。通過對梯度更新過程施加這一簡單約束，我們可以利用LMs的固有行為，從而發揮其天生的CL能力。我們的實驗表明，MIGU對所有三種LM架構（T5、RoBERTa和Llama2）都具有普遍適用性，在四個CL基準測試中持續微調和持續預訓練設置中提供最先進或同等水平的性能。例如，在一個包含15個任務的CL基準測試中，MIGU相比於傳統的參數高效微調基線帶來了15.2％的平均準確性改進。MIGU還可以與所有三種現有的CL類型無縫集成，以進一步提高性能。代碼可在https://github.com/wenyudu/MIGU{此https URL}找到。

English

Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce MIGU (MagnItude-based Gradient Updating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at https://github.com/wenyudu/MIGU{this https URL}.

在語言模型中解鎖持續學習能力

Unlocking Continual Learning Abilities in Language Models

摘要

Support