GraLoRA：面向参数高效微调的细粒度低秩适配

摘要

低秩适应（LoRA）是一种广受欢迎的生成模型参数高效微调（PEFT）方法，因其简洁性和有效性而备受推崇。尽管近期有所改进，LoRA仍面临一个根本性局限：当瓶颈扩大时易出现过拟合。它在秩为32至64时表现最佳，但在更高秩时准确率停滞不前或下降，仍未能达到全量微调（FFT）的性能水平。我们发现问题根源在于LoRA的结构性瓶颈，它向无关的输入通道引入了梯度纠缠，扭曲了梯度传播。为解决这一问题，我们提出了一种新颖的结构——粒度低秩适应（GraLoRA），它将权重矩阵划分为子块，每个子块配备独立的低秩适配器。在计算或存储成本几乎不变的情况下，GraLoRA克服了LoRA的局限，有效提升了表示能力，更接近FFT的行为。在代码生成和常识推理基准测试上的实验表明，GraLoRA持续超越LoRA及其他基线方法，在HumanEval+上的Pass@1指标上实现了高达+8.5%的绝对提升。这些改进在不同模型规模和秩设置下均保持一致，使GraLoRA成为PEFT的可扩展且稳健的解决方案。代码、数据及脚本已发布于https://github.com/SqueezeBits/GraLoRA.git。

English

Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA's structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA's limitations, effectively increases the representational capacity, and more closely approximates FFT behavior. Experiments on code generation and commonsense reasoning benchmarks show that GraLoRA consistently outperforms LoRA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on HumanEval+. These improvements hold across model sizes and rank settings, making GraLoRA a scalable and robust solution for PEFT. Code, data, and scripts are available at https://github.com/SqueezeBits/GraLoRA.git

GraLoRA：面向参数高效微调的细粒度低秩适配

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

摘要

Support