从词元与参数层面分析监督微调对模型知识的影响
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels
September 20, 2025
作者: Junjie Ye, Yuming Yang, Yang Nan, Shuo Li, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan
cs.AI
摘要
大型语言模型(LLMs)在预训练阶段获取了大量世界知识,这些知识随后通过监督微调(SFT)等后训练技术得到进一步塑造。然而,SFT对模型知识的影响仍未被充分探索,这限制了我们控制微调模型知识变化行为的能力。为填补这一空白,我们评估了来自LLaMA-2和LLaMA-3家族的五个LLM在闭卷问答(CBQA)任务中的表现。令人惊讶的是,使用1,920个样本微调的模型表现比仅用240个样本微调的模型差高达14%。此外,微调数据中知识掌握程度的变化会导致性能波动超过12%。为探究这些效应,我们从标记和参数两个层面分析了模型行为。分析表明,SFT过程中高达90%的参数更新并未促进知识增强。根据微调数据的特性,恢复这些更新可以提升CBQA任务的表现。这些见解为开发更有效强化模型知识的微调策略提供了实用指导。
English
Large language models (LLMs) acquire substantial world knowledge during
pre-training, which is further shaped by post-training techniques such as
supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge
remains underexplored, limiting our ability to control knowledge change
behavior in fine-tuned models. To address this gap, we evaluate closed-book
question answering (CBQA) performance across five LLMs from the LLaMA-2 and
LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up
to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying
the level of knowledge mastery in the fine-tuning data leads to performance
fluctuations of over 12%. To investigate these effects, we analyze model
behavior at both the token and parameter levels. Our analysis reveals that up
to 90% of parameter updates during SFT do not contribute to knowledge
enhancement. Restoring these updates can improve performance on the CBQA task,
depending on the characteristics of the fine-tuning data. These insights offer
practical guidance for developing fine-tuning strategies that more effectively
strengthen model knowledge.