すべてを支配する1つの初期化：説明された分散を介したファインチューニング適応

要旨

ファウンデーションモデル（FMs）は大規模なデータセットで事前にトレーニングされ、その後特定のアプリケーションのためのダウンストリームタスクでファインチューニングされます。最も成功して一般的に使用されているファインチューニング方法は、事前にトレーニングされた重みを低ランク適応（LoRA）を介して更新することです。LoRAは通常、モデルの重み全体に均一なランク分布を持つランダムに初期化された新しい重み行列を導入します。最近の研究では、トレーニング中に重み駆動の初期化や適応的なランクの学習に焦点を当てています。両方のアプローチは単独で調査されており、収束が遅いか均一なランク分布となり、結果として最適なパフォーマンスが得られません。私たちは、アクティベーションベクトルのミニバッチで特異値分解を計算することにより、新しい重みをデータ駆動の方法で初期化することでLoRAを強化することを提案します。その後、得られた右特異ベクトルでLoRA行列を初期化し、すべての重み行列にランクを再分配して最大の分散量を説明し、標準のLoRAファインチューニング手順を継続します。これにより、私たちの新しい手法である説明された分散適応（EVA）が生まれます。私たちは、言語生成や理解から画像分類、強化学習までさまざまなファインチューニングタスクにEVAを適用します。EVAは競合他社よりも収束が速く、ドメインごとに多数のタスクで最高の平均スコアを達成します。

English

Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across model weights. Recent works focus on weight-driven initialization or learning of adaptive ranks during training. Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to sub-optimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

すべてを支配する1つの初期化：説明された分散を介したファインチューニング適応

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

要旨

Support