真実のベールを剥ぐ：推論重視の教師ありファインチューニングにおけるランク削減後に主成分重みが現れる

要旨

最近の研究では、少数の高品質なデータセットを用いたLLMの教師ありファインチューニングが、強力な推論能力を発揮することが示されています。しかし、完全なファインチューニング（Full FT）は強力である一方で、計算コストが高く、特にデータが限られている場合には過学習や破滅的忘却に陥りやすいという課題があります。以前、モデルパラメータの一部のみを更新することで顕著な成功を収めたスパースファインチューニングは、効率性と効果性の間の有望なトレードオフを提供します。しかし、LLM時代においては、推論に本当に重要なパラメータを特定する難しさから、その活用が遅れていました。本研究では、低ランク近似後に最大の大きさを持つ重みがファインチューニングの重要な重みであると主張し、これを「主重み（Principal Weights）」と呼びます。驚くべきことに、大きさに基づくスパースファインチューニングはLLMファインチューニングのベースラインとしては性能が低いものの、ランク削減後には非常に効果的になります。これらの知見に基づき、我々は「低ランク情報を活用したスパースファインチューニング（Low-rank Informed Sparse Fine-Tuning, LIFT）」を提案します。LIFTは、トレーニング全体を通じて上位5%の主重みのみを更新し、推論タスクにおいてFull FTを上回る性能を一貫して達成しつつ、人気のあるパラメータ効率的なファインチューニング手法と同等のメモリ効率を維持します。算術推論などのターゲットドメインでの強力な性能に加え、LIFTはFull FTやLoRAと比較して最大20%多くのソースドメイン知識を保持します。我々のコードは以下で公開されています：https://github.com/zihanghliu/LIFT。

English

Recent studies have shown that supervised fine-tuning of LLMs on a small number of high-quality datasets can yield strong reasoning capabilities. However, full fine-tuning (Full FT), while powerful, is computationally expensive and susceptible to overfitting and catastrophic forgetting, particularly when data is limited. Sparse fine-tuning, which previously achieved notable success by updating only a small subset of model parameters, offers a promising trade-off between efficiency and effectiveness. Yet, it has lagged behind in the LLM era due to the difficulty of identifying parameters truly critical for reasoning. In this work, we state that weights with the largest magnitude after low-rank approximation are critical weights for fine-tuning, which we call Principal Weights. Surprisingly, while magnitude-based sparse fine-tuning performs poorly as a baseline on LLM fine-tuning, it becomes highly effective after rank reduction. These insights motivate our method: Low-rank Informed Sparse Fine-Tuning (LIFT). LIFT only updates the top 5% Principal Weights throughout training and consistently achieves better performance on reasoning tasks than Full FT, while maintaining memory efficiency on par with popular parameter-efficient fine-tuning methods. In addition to strong performance on target domains such as arithmetic reasoning, LIFT also retains up to 20% more source-domain knowledge, compared to Full FT and LoRA. Our code is available at: https://github.com/zihanghliu/LIFT.

真実のベールを剥ぐ：推論重視の教師ありファインチューニングにおけるランク削減後に主成分重みが現れる

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

要旨

Support