從詞元與參數層面探討監督式微調對模型知識的影響

摘要

大型語言模型（LLMs）在預訓練階段獲取了大量的世界知識，這些知識隨後通過監督式微調（SFT）等後訓練技術進一步塑造。然而，SFT對模型知識的影響尚未得到充分探討，這限制了我們在微調模型中控制知識變更行為的能力。為填補這一空白，我們評估了來自LLaMA-2和LLaMA-3系列的五個LLM在閉卷問答（CBQA）任務中的表現。令人驚訝的是，使用1,920個樣本進行微調的模型表現比僅使用240個樣本微調的模型差達14%。此外，微調數據中知識掌握程度的不同會導致超過12%的性能波動。為探究這些效應，我們從詞元和參數層面分析了模型行為。分析顯示，SFT過程中高達90%的參數更新並未對知識增強做出貢獻。根據微調數據的特性，恢復這些更新可以提升CBQA任務的表現。這些見解為開發更有效強化模型知識的微調策略提供了實用指導。

English

Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.

從詞元與參數層面探討監督式微調對模型知識的影響

Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

摘要

Support