直交ファインチューニングのスケーラビリティ実現

要旨

直交ファインチューニング（OFT）は、パラメータ効率の高い適応を実現しつつ破滅的忘却を防ぐが、高い実行時間とメモリ要求のため実用的な展開が制限されている。本研究では、OFTの計算上のボトルネックが、立方オーダーの複雑さを持つ高コストな行列-行列乗算に依存する重み中心の実装にあることを特定した。これを克服するため、行列-ベクトル乗算（すなわち行列フリー計算）を用いる入力中心の再定式化であるOFTv2を提案し、計算コストを二次オーダーに削減した。さらに、ケイリー変換における行列逆変換を切断ノイマン級数で近似する効率的な直交パラメータ化手法であるケイリー-ノイマンパラメータ化を導入した。これらの改良により、OFTv2は性能を損なうことなく、最大10倍の高速な学習と3倍の低GPUメモリ使用量を実現した。加えて、OFTv2を量子化された基盤モデルのファインチューニングに対応させ、人気のQLoRAを訓練の安定性、効率性、メモリ使用量の点で上回ることを示した。

English

Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley-Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.

直交ファインチューニングのスケーラビリティ実現

Orthogonal Finetuning Made Scalable

要旨

Support