LoRAを再び偉大に：適応的特異値とMixture-of-Experts最適化アラインメントによるLoRAの強化

要旨

低ランク適応（LoRA）は大規模言語モデル（LLMs）のパラメータ効率の良いファインチューニングを可能にするが、その性能は完全なファインチューニング（Full FT）に及ばないことが多い。現在の手法では、静的な特異値分解（SVD）サブセットを用いて初期化することでLoRAを最適化しているが、これでは事前学習された知識を十分に活用できない。LoRAを改善する別のアプローチとして、Mixture-of-Experts（MoE）アーキテクチャの導入がある。しかし、重みの不整合や複雑な勾配ダイナミクスのため、LoRA MoEアーキテクチャに先立ってSVDを適用することは困難である。これらの問題を緩和するため、我々はGreat LoRA Mixture-of-Expert（GOAT）を提案する。このフレームワークは、(1) SVD構造化されたMoEを用いて関連する事前情報を適応的に統合し、(2) 理論的なスケーリング係数を導出することで、完全にファインチューニングされたMoEとの最適化を整合させる。アーキテクチャや学習アルゴリズムを変更することなく適切なスケーリングを行うことで、LoRA MoEの効率と性能が向上することを示す。自然言語理解、常識推論、画像分類、自然言語生成を含む25のデータセットでの実験により、GOATが最先端の性能を発揮し、Full FTとのギャップを埋めることが実証された。

English

While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture. However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture. To mitigate these issues, we propose Great LoRA Mixture-of-Expert (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. We demonstrate that proper scaling, without modifying the architecture or training algorithms, boosts LoRA MoE's efficiency and performance. Experiments across 25 datasets, including natural language understanding, commonsense reasoning, image classification, and natural language generation, demonstrate GOAT's state-of-the-art performance, closing the gap with Full FT.

LoRAを再び偉大に：適応的特異値とMixture-of-Experts最適化アラインメントによるLoRAの強化

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

要旨

Support