从词元与参数层面分析监督微调对模型知识的影响

摘要

大型语言模型（LLMs）在预训练阶段获取了大量世界知识，这些知识随后通过监督微调（SFT）等后训练技术得到进一步塑造。然而，SFT对模型知识的影响仍未被充分探索，这限制了我们控制微调模型知识变化行为的能力。为填补这一空白，我们评估了来自LLaMA-2和LLaMA-3家族的五个LLM在闭卷问答（CBQA）任务中的表现。令人惊讶的是，使用1,920个样本微调的模型表现比仅用240个样本微调的模型差高达14%。此外，微调数据中知识掌握程度的变化会导致性能波动超过12%。为探究这些效应，我们从标记和参数两个层面分析了模型行为。分析表明，SFT过程中高达90%的参数更新并未促进知识增强。根据微调数据的特性，恢复这些更新可以提升CBQA任务的表现。这些见解为开发更有效强化模型知识的微调策略提供了实用指导。

English

Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.

从词元与参数层面分析监督微调对模型知识的影响

Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

摘要

Support