一种初始化以掌控所有：通过解释方差进行微调适应

摘要

基础模型（FMs）是在大规模数据集上进行预训练，然后针对特定应用的下游任务进行微调。最成功和最常用的微调方法是通过低秩适应（LoRA）更新预训练权重。LoRA引入新的权重矩阵，通常以均匀秩分布随机初始化模型权重。最近的研究集中在权重驱动的初始化或学习自适应秩的训练过程中。这两种方法只被单独研究，导致收敛速度较慢或者产生均匀秩分布，从而导致次优性能。我们提出通过在激活向量的小批量上计算奇异值分解来以数据驱动的方式初始化新权重。然后，我们使用获得的右奇异向量初始化LoRA矩阵，并在所有权重矩阵之间重新分配秩，以解释最大数量的方差，并继续标准的LoRA微调过程。这导致我们的新方法解释方差适应（EVA）。我们将EVA应用于各种微调任务，从语言生成和理解到图像分类和强化学习。EVA比竞争对手表现出更快的收敛速度，并在每个领域的多项任务中获得最高平均分数。

English

Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across model weights. Recent works focus on weight-driven initialization or learning of adaptive ranks during training. Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to sub-optimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

一种初始化以掌控所有：通过解释方差进行微调适应

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

摘要

Support