正交微调实现规模化应用

摘要

正交微调（OFT）在防止灾难性遗忘的同时提供了高效的参数适应能力，但其较高的运行时间和内存需求限制了实际部署。我们发现OFT的核心计算瓶颈在于其以权重为中心的实现方式，这种方式依赖于复杂度为立方的矩阵-矩阵乘法。为克服这一问题，我们提出了OFTv2，一种以输入为中心的重新表述，转而采用矩阵-向量乘法（即无矩阵计算），将计算成本降至平方级别。我们进一步引入了Cayley-Neumann参数化，这是一种高效的正交参数化方法，通过截断的Neumann级数近似Cayley变换中的矩阵求逆。这些改进使得OFTv2在不影响性能的前提下，实现了高达10倍的训练速度提升和3倍的GPU内存使用降低。此外，我们将OFTv2扩展至支持量化基础模型的微调，并证明其在训练稳定性、效率和内存使用方面均优于流行的QLoRA方法。

English

Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley-Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.

正交微调实现规模化应用

Orthogonal Finetuning Made Scalable

摘要

Support