SparseLoRA：利用上下文稀疏性加速大型語言模型微調

摘要

微調大型語言模型（LLMs）在計算和記憶體方面都極為耗費資源。雖然參數高效的微調方法，如QLoRA和DoRA，減少了可訓練參數的數量並降低了記憶體使用量，但它們並未降低計算成本。在某些情況下，這些方法甚至可能減慢微調速度。本文介紹了SparseLoRA，這是一種通過上下文稀疏性來加速LLM微調的方法。我們提出了一種輕量級、無需訓練的SVD稀疏性估計器，它能動態選擇權重的稀疏子集進行損失和梯度計算。此外，我們系統地分析並解決了跨層、跨令牌和訓練步驟的敏感性問題。實驗結果顯示，SparseLoRA在保持各種下游任務（包括常識和算術推理、代碼生成及指令遵循）準確性的同時，將計算成本降低了最多2.2倍，並實現了最多1.6倍的實測加速。

English

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

SparseLoRA：利用上下文稀疏性加速大型語言模型微調

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

摘要

Support