VeRA:基於向量的隨機矩陣適應
VeRA: Vector-based Random Matrix Adaptation
October 17, 2023
作者: Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki Markus Asano
cs.AI
摘要
低秩適應(LoRA)是一種流行的方法,用於微調大型語言模型時減少可訓練參數的數量,但在擴展到更大模型或部署眾多每個用戶或每個任務適應模型時仍面臨嚴重的存儲挑戰。在這項工作中,我們提出了基於向量的隨機矩陣適應(VeRA),與LoRA相比,可將可訓練參數減少10倍,並保持相同性能。它通過在所有層之間使用一對低秩矩陣並學習小的縮放向量來實現這一目標。我們在GLUE和E2E基準測試上展示了其有效性,並展示了它在指令遵循中的應用,僅使用Llama2 7B模型的1.4M參數。
English
Low-rank adapation (LoRA) is a popular method that reduces the number of
trainable parameters when finetuning large language models, but still faces
acute storage challenges when scaling to even larger models or deploying
numerous per-user or per-task adapted models. In this work, we present
Vector-based Random Matrix Adaptation (VeRA), which reduces the number of
trainable parameters by 10x compared to LoRA, yet maintains the same
performance. It achieves this by using a single pair of low-rank matrices
shared across all layers and learning small scaling vectors instead. We
demonstrate its effectiveness on the GLUE and E2E benchmarks, and show its
application in instruction-following with just 1.4M parameters using the Llama2
7B model.