VeRA: ベクトルベースのランダム行列適応法

要旨

低ランク適応（LoRA）は、大規模言語モデルのファインチューニングにおいて学習可能なパラメータ数を削減する一般的な手法であるが、さらに大規模なモデルへのスケーリングや、多数のユーザーごとまたはタスクごとの適応モデルの展開において、深刻なストレージの課題に直面している。本研究では、Vector-based Random Matrix Adaptation（VeRA）を提案し、LoRAと比較して学習可能なパラメータ数を10分の1に削減しながら、同等の性能を維持する。これは、すべての層で共有される単一の低ランク行列ペアを使用し、代わりに小さなスケーリングベクトルを学習することで実現される。GLUEおよびE2Eベンチマークにおける有効性を実証し、Llama2 7Bモデルを使用した指示追従タスクにおいて、わずか1.4Mのパラメータでその応用を示す。

English

Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which reduces the number of trainable parameters by 10x compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, and show its application in instruction-following with just 1.4M parameters using the Llama2 7B model.

VeRA: ベクトルベースのランダム行列適応法

VeRA: Vector-based Random Matrix Adaptation

要旨

Support