揭开真相的面纱:面向推理的监督微调中,降秩后主权重显现
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
June 1, 2025
作者: Zihang Liu, Tianyu Pang, Oleg Balabanov, Chaoqun Yang, Tianjin Huang, Lu Yin, Yaoqing Yang, Shiwei Liu
cs.AI
摘要
近期研究表明,在少量高质量数据集上对大型语言模型(LLMs)进行监督微调,能够显著提升其推理能力。然而,尽管全参数微调(Full FT)效果显著,但其计算成本高昂,且易出现过拟合和灾难性遗忘问题,尤其在数据有限时更为突出。稀疏微调方法通过仅更新模型参数的一小部分,在效率与效果之间取得了良好平衡,此前已取得显著成功。但在LLM时代,由于难以准确识别对推理至关重要的参数,稀疏微调的发展相对滞后。本研究中,我们提出,经过低秩近似后具有最大幅度的权重是微调的关键权重,我们称之为“主权重”。令人惊讶的是,基于幅度的稀疏微调作为LLM微调的基线表现不佳,但在降秩后却变得极为有效。这些发现启发了我们的方法:低秩引导的稀疏微调(LIFT)。LIFT在整个训练过程中仅更新前5%的主权重,在推理任务上持续优于全参数微调,同时保持了与流行的高效参数微调方法相当的内存效率。除了在算术推理等目标领域表现出色外,与全参数微调和LoRA相比,LIFT还能保留多达20%的源领域知识。我们的代码已公开于:https://github.com/zihanghliu/LIFT。
English
Recent studies have shown that supervised fine-tuning of LLMs on a small
number of high-quality datasets can yield strong reasoning capabilities.
However, full fine-tuning (Full FT), while powerful, is computationally
expensive and susceptible to overfitting and catastrophic forgetting,
particularly when data is limited. Sparse fine-tuning, which previously
achieved notable success by updating only a small subset of model parameters,
offers a promising trade-off between efficiency and effectiveness. Yet, it has
lagged behind in the LLM era due to the difficulty of identifying parameters
truly critical for reasoning. In this work, we state that weights with the
largest magnitude after low-rank approximation are critical weights for
fine-tuning, which we call Principal Weights. Surprisingly, while
magnitude-based sparse fine-tuning performs poorly as a baseline on LLM
fine-tuning, it becomes highly effective after rank reduction. These insights
motivate our method: Low-rank Informed Sparse Fine-Tuning (LIFT). LIFT only
updates the top 5% Principal Weights throughout training and consistently
achieves better performance on reasoning tasks than Full FT, while maintaining
memory efficiency on par with popular parameter-efficient fine-tuning methods.
In addition to strong performance on target domains such as arithmetic
reasoning, LIFT also retains up to 20% more source-domain knowledge, compared
to Full FT and LoRA. Our code is available at:
https://github.com/zihanghliu/LIFT.Summary
AI-Generated Summary