正交微調實現可擴展性

摘要

正交微調（OFT）提供了一種高度參數效率的適應方法，同時防止災難性遺忘，但其高運行時間和內存需求限制了實際部署。我們發現OFT的核心計算瓶頸在於其以權重為中心的實現方式，該方式依賴於具有立方複雜度的昂貴矩陣-矩陣乘法。為克服這一問題，我們提出了OFTv2，這是一種以輸入為中心的重新表述，轉而使用矩陣-向量乘法（即無矩陣計算），將計算成本降低至二次方。我們進一步引入了Cayley-Neumann參數化，這是一種高效的正交參數化方法，通過截斷的Neumann級數來近似Cayley變換中的矩陣求逆。這些修改使得OFTv2在不影響性能的情況下，實現了高達10倍的訓練速度提升和3倍的GPU內存使用降低。此外，我們將OFTv2擴展至支持量化基礎模型的微調，並展示其在訓練穩定性、效率和內存使用方面優於流行的QLoRA。

English

Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley-Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.

正交微調實現可擴展性

Orthogonal Finetuning Made Scalable

摘要

Support