그들을 다 다스리는 초기화: 설명된 분산을 통한 미세 조정 적응

초록

기반 모델 (Foundation models, FMs)은 대규모 데이터셋에서 사전 훈련된 후 특정 응용 프로그램을 위한 하류 작업에서 세밀 조정됩니다. 가장 성공적이고 가장 일반적으로 사용되는 세밀 조정 방법은 사전 훈련된 가중치를 저랭크 적응 (Low-rank adaptation, LoRA)을 통해 업데이트하는 것입니다. LoRA는 일반적으로 모델 가중치 전체에 균일한 랭크 분포로 임의로 초기화된 새로운 가중치 행렬을 도입합니다. 최근 연구는 가중치 중심 초기화 또는 훈련 중 적응적 랭크 학습에 초점을 맞추고 있습니다. 그러나 이러한 두 접근 방식은 독립적으로만 조사되어 왔으며, 이는 수렴 속도가 느리거나 균일한 랭크 분포로 이어져 최적의 성능을 발휘하지 못하게 됩니다. 우리는 활성화 벡터의 미니배치에서 특이값 분해를 계산하여 새로운 가중치를 데이터 기반으로 초기화하는 방식으로 LoRA를 개선하는 것을 제안합니다. 그런 다음, 우리는 얻은 오른쪽 특이 벡터로 LoRA 행렬을 초기화하고 모든 가중치 행렬 사이에서 분산의 최대 양을 설명하기 위해 랭크를 재분배하고 표준 LoRA 세밀 조정 절차를 계속합니다. 이로써 우리의 새로운 방법인 설명된 분산 적응 (Explained Variance Adaptation, EVA)가 탄생합니다. 우리는 언어 생성 및 이해부터 이미지 분류 및 강화 학습에 이르기까지 다양한 세밀 조정 작업에 EVA를 적용합니다. EVA는 경쟁 상대보다 빠른 수렴을 보이며 도메인 당 다양한 작업에서 가장 높은 평균 점수를 달성합니다.

English

Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across model weights. Recent works focus on weight-driven initialization or learning of adaptive ranks during training. Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to sub-optimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

그들을 다 다스리는 초기화: 설명된 분산을 통한 미세 조정 적응

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

초록

Support