SparseLoRA:利用上下文稀疏性加速大语言模型微调
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
June 19, 2025
作者: Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Chenfeng Xu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu
cs.AI
摘要
微调大型语言模型(LLMs)在计算和内存上均耗费巨大。尽管参数高效微调方法,如QLoRA和DoRA,减少了可训练参数的数量并降低了内存使用,它们并未降低计算成本。在某些情况下,这些方法甚至可能减缓微调速度。本文中,我们提出了SparseLoRA,一种通过上下文稀疏性加速LLM微调的方法。我们设计了一种轻量级、无需训练的SVD稀疏性估计器,能够动态选择权重的一个稀疏子集用于损失和梯度计算。同时,我们系统性地分析并解决了跨层、跨标记及训练步骤的敏感性问题。实验结果表明,SparseLoRA在保持各类下游任务(包括常识与算术推理、代码生成及指令遵循)准确性的同时,将计算成本最高降低了2.2倍,实测速度提升最高达1.6倍。
English
Fine-tuning LLMs is both computationally and memory-intensive. While
parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the
number of trainable parameters and lower memory usage, they do not decrease
computational cost. In some cases, they may even slow down fine-tuning. In this
paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning
through contextual sparsity. We propose a lightweight, training-free SVD
sparsity estimator that dynamically selects a sparse subset of weights for loss
and gradient computation. Also, we systematically analyze and address
sensitivity across layers, tokens, and training steps. Our experimental results
show that SparseLoRA reduces computational cost by up to 2.2 times and a
measured speedup of up to 1.6 times while maintaining accuracy across various
downstream tasks, including commonsense and arithmetic reasoning, code
generation, and instruction following.