VeRA:基于向量的随机矩阵适应
VeRA: Vector-based Random Matrix Adaptation
October 17, 2023
作者: Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki Markus Asano
cs.AI
摘要
低秩适应(LoRA)是一种流行的方法,用于微调大型语言模型时减少可训练参数的数量,但在扩展到更大模型或部署大量每用户或每任务适应模型时仍面临严峻的存储挑战。在这项工作中,我们提出了基于向量的随机矩阵适应(VeRA),与LoRA相比,它将可训练参数数量减少了10倍,同时保持了相同的性能。它通过在所有层之间共享一对低秩矩阵并学习小的缩放向量来实现这一点。我们在GLUE和E2E基准测试上展示了其有效性,并展示了它在指令跟随中的应用,仅使用Llama2 7B模型的1.4M参数。
English
Low-rank adapation (LoRA) is a popular method that reduces the number of
trainable parameters when finetuning large language models, but still faces
acute storage challenges when scaling to even larger models or deploying
numerous per-user or per-task adapted models. In this work, we present
Vector-based Random Matrix Adaptation (VeRA), which reduces the number of
trainable parameters by 10x compared to LoRA, yet maintains the same
performance. It achieves this by using a single pair of low-rank matrices
shared across all layers and learning small scaling vectors instead. We
demonstrate its effectiveness on the GLUE and E2E benchmarks, and show its
application in instruction-following with just 1.4M parameters using the Llama2
7B model.