一种初始化以掌控所有:通过解释方差进行微调适应
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
October 9, 2024
作者: Fabian Paischer, Lukas Hauzenberger, Thomas Schmied, Benedikt Alkin, Marc Peter Deisenroth, Sepp Hochreiter
cs.AI
摘要
基础模型(FMs)是在大规模数据集上进行预训练,然后针对特定应用的下游任务进行微调。最成功和最常用的微调方法是通过低秩适应(LoRA)更新预训练权重。LoRA引入新的权重矩阵,通常以均匀秩分布随机初始化模型权重。最近的研究集中在权重驱动的初始化或学习自适应秩的训练过程中。这两种方法只被单独研究,导致收敛速度较慢或者产生均匀秩分布,从而导致次优性能。我们提出通过在激活向量的小批量上计算奇异值分解来以数据驱动的方式初始化新权重。然后,我们使用获得的右奇异向量初始化LoRA矩阵,并在所有权重矩阵之间重新分配秩,以解释最大数量的方差,并继续标准的LoRA微调过程。这导致我们的新方法解释方差适应(EVA)。我们将EVA应用于各种微调任务,从语言生成和理解到图像分类和强化学习。EVA比竞争对手表现出更快的收敛速度,并在每个领域的多项任务中获得最高平均分数。
English
Foundation models (FMs) are pre-trained on large-scale datasets and then
fine-tuned on a downstream task for a specific application. The most successful
and most commonly used fine-tuning method is to update the pre-trained weights
via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are
usually initialized at random with a uniform rank distribution across model
weights. Recent works focus on weight-driven initialization or learning of
adaptive ranks during training. Both approaches have only been investigated in
isolation, resulting in slow convergence or a uniform rank distribution, in
turn leading to sub-optimal performance. We propose to enhance LoRA by
initializing the new weights in a data-driven manner by computing singular
value decomposition on minibatches of activation vectors. Then, we initialize
the LoRA matrices with the obtained right-singular vectors and re-distribute
ranks among all weight matrices to explain the maximal amount of variance and
continue the standard LoRA fine-tuning procedure. This results in our new
method Explained Variance Adaptation (EVA). We apply EVA to a variety of
fine-tuning tasks ranging from language generation and understanding to image
classification and reinforcement learning. EVA exhibits faster convergence than
competitors and attains the highest average score across a multitude of tasks
per domain.Summary
AI-Generated Summary