스케일 가능한 직교 미세 조정

초록

직교 미세 조정(Orthogonal Finetuning, OFT)은 높은 매개변수 효율성을 제공하면서도 치명적 망각(catastrophic forgetting)을 방지하지만, 높은 실행 시간과 메모리 요구량으로 인해 실제 배포에 제약이 있습니다. 본 연구에서는 OFT의 핵심 계산 병목 현상을 가중치 중심 구현으로 규명하였으며, 이는 복잡도가 3차인 고비용의 행렬-행렬 곱셈에 의존함을 확인했습니다. 이를 극복하기 위해, 우리는 입력 중심의 재구성인 OFTv2를 제안합니다. 이는 행렬-벡터 곱셈(즉, 행렬 없는 계산)을 사용하여 계산 비용을 2차로 줄입니다. 또한, 우리는 Cayley 변환에서의 행렬 역행렬을 절단된 Neumann 급수로 근사하는 효율적인 직교 매개변수화인 Cayley-Neumann 매개변수화를 도입했습니다. 이러한 수정을 통해 OFTv2는 성능 저하 없이 최대 10배 빠른 학습과 3배 낮은 GPU 메모리 사용량을 달성할 수 있습니다. 추가적으로, 우리는 OFTv2를 양자화된 기반 모델(foundation model)의 미세 조정을 지원하도록 확장하였으며, 이는 인기 있는 QLoRA를 학습 안정성, 효율성, 메모리 사용량 측면에서 능가함을 보여줍니다.

English

Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment. We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity. To overcome this, we propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic. We further introduce the Cayley-Neumann parameterization, an efficient orthogonal parameterization that approximates the matrix inversion in Cayley transform via a truncated Neumann series. These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without compromising performance. In addition, we extend OFTv2 to support finetuning quantized foundation models and show that it outperforms the popular QLoRA in training stability, efficiency, and memory usage.

스케일 가능한 직교 미세 조정

Orthogonal Finetuning Made Scalable

초록

Support