ChatPaper.aiChatPaper

通过源语言屏蔽更新缓解大语言模型目标语言适应中的灾难性遗忘

Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

December 4, 2025
作者: Atsuki Yamaguchi, Terufumi Morishita, Aline Villavicencio, Nikolaos Aletras
cs.AI

摘要

提升指令型大语言模型(LLM)的语言多样性对实现全球可及性至关重要,但这一进程常受限于两大障碍:对昂贵的目标语言标注数据的依赖,以及模型适应过程中出现的灾难性遗忘问题。我们在低资源现实场景下应对这一挑战:仅使用未标注的目标语言数据对指令型LLM进行适应性调整。我们提出源知识屏蔽更新法(SSU),这是一种选择性参数更新策略,能主动保护源语言知识。该方法通过少量源语言数据和参数重要性评分机制,识别出对维持源语言能力至关重要的参数,并在模型适应前采用列式冻结策略保护这些参数。在五种类型学特征各异的语言及7B/13B模型上的实验表明,SSU能有效缓解灾难性遗忘——将单语源语言任务的性能下降幅度控制在平均3.4%(7B模型)和2.8%(13B模型),与全参数微调导致的20.3%和22.3%下降形成鲜明对比。同时,SSU在目标语言任务上的表现与全参数微调高度相当:在7B模型的所有基准测试中均优于全参数微调,在13B模型的大部分测试中也表现更优。
English
Expanding the linguistic diversity of instruct large language models (LLMs) is crucial for global accessibility but is often hindered by the reliance on costly specialized target language labeled data and catastrophic forgetting during adaptation. We tackle this challenge under a realistic, low-resource constraint: adapting instruct LLMs using only unlabeled target language data. We introduce Source-Shielded Updates (SSU), a selective parameter update strategy that proactively preserves source knowledge. Using a small set of source data and a parameter importance scoring method, SSU identifies parameters critical to maintaining source abilities. It then applies a column-wise freezing strategy to protect these parameters before adaptation. Experiments across five typologically diverse languages and 7B and 13B models demonstrate that SSU successfully mitigates catastrophic forgetting. It reduces performance degradation on monolingual source tasks to just 3.4% (7B) and 2.8% (13B) on average, a stark contrast to the 20.3% and 22.3% from full fine-tuning. SSU also achieves target-language performance highly competitive with full fine-tuning, outperforming it on all benchmarks for 7B models and the majority for 13B models.
PDF21December 6, 2025