SparseLoRA: 文脈的スパース性によるLLMファインチューニングの高速化

要旨

LLMのファインチューニングは、計算量とメモリ使用量の両面で負荷が大きい。QLoRAやDoRAなどのパラメータ効率型ファインチューニング手法は、学習可能なパラメータ数を削減しメモリ使用量を低減するが、計算コストを削減することはない。場合によっては、ファインチューニングの速度を遅くすることさえある。本論文では、コンテキストスパース性を活用してLLMのファインチューニングを高速化する手法であるSparseLoRAを提案する。我々は、軽量で学習不要なSVDスパース性推定器を導入し、損失と勾配計算のために動的にスパースな重みのサブセットを選択する。さらに、層、トークン、学習ステップにわたる感度を体系的に分析し、その課題に対処する。実験結果から、SparseLoRAは計算コストを最大2.2倍削減し、実測速度を最大1.6倍向上させながら、常識推論や算術推論、コード生成、指示追従などの多様な下流タスクにおいて精度を維持することが示された。

English

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

SparseLoRA: 文脈的スパース性によるLLMファインチューニングの高速化

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

要旨

Support