一種初始化來統治它們:通過解釋方差微調適應
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
October 9, 2024
作者: Fabian Paischer, Lukas Hauzenberger, Thomas Schmied, Benedikt Alkin, Marc Peter Deisenroth, Sepp Hochreiter
cs.AI
摘要
基礎模型(FMs)是在大規模數據集上預先訓練,然後在特定應用的下游任務上進行微調。最成功和最常用的微調方法是通過低秩適應(LoRA)更新預先訓練的權重。LoRA引入新的權重矩陣,通常以隨機方式初始化,並在模型權重之間具有均勻的秩分佈。最近的研究專注於權重驅動的初始化或在訓練期間學習適應性秩。這兩種方法僅被孤立地研究,導致收斂速度緩慢或秩分佈均勻,進而導致次優性能。我們提出通過在小批量激活向量上計算奇異值分解,以數據驅動的方式初始化新權重,從而增強LoRA。然後,我們使用獲得的右奇異向量初始化LoRA矩陣,並在所有權重矩陣之間重新分配秩,以解釋最大變異量並繼續標準LoRA微調程序。這導致我們的新方法解釋變異適應(EVA)。我們將EVA應用於各種微調任務,從語言生成和理解到圖像分類和強化學習。EVA比競爭對手表現出更快的收斂速度,並在每個領域的眾多任務中獲得最高平均分數。
English
Foundation models (FMs) are pre-trained on large-scale datasets and then
fine-tuned on a downstream task for a specific application. The most successful
and most commonly used fine-tuning method is to update the pre-trained weights
via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are
usually initialized at random with a uniform rank distribution across model
weights. Recent works focus on weight-driven initialization or learning of
adaptive ranks during training. Both approaches have only been investigated in
isolation, resulting in slow convergence or a uniform rank distribution, in
turn leading to sub-optimal performance. We propose to enhance LoRA by
initializing the new weights in a data-driven manner by computing singular
value decomposition on minibatches of activation vectors. Then, we initialize
the LoRA matrices with the obtained right-singular vectors and re-distribute
ranks among all weight matrices to explain the maximal amount of variance and
continue the standard LoRA fine-tuning procedure. This results in our new
method Explained Variance Adaptation (EVA). We apply EVA to a variety of
fine-tuning tasks ranging from language generation and understanding to image
classification and reinforcement learning. EVA exhibits faster convergence than
competitors and attains the highest average score across a multitude of tasks
per domain.Summary
AI-Generated Summary