Maak LoRA Geweldig Weer: Verbetering van LoRA met Adaptieve Singuliere Waarden en Mixture-of-Experts Optimalisatie-uitlijning
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
February 24, 2025
Auteurs: Chenghao Fan, Zhenyi Lu, Sichen Liu, Xiaoye Qu, Wei Wei, Chengfeng Gu, Yu Cheng
cs.AI
Samenvatting
GOAT achieves this with only 0.1% to 1% of the
trainable parameters of Full FT, making it a highly parameter-efficient
alternative.
English
While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for
Large Language Models (LLMs), its performance often falls short of Full
Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with
static singular value decomposition (SVD) subsets, leading to suboptimal
leveraging of pre-trained knowledge. Another path for improving LoRA is
incorporating a Mixture-of-Experts (MoE) architecture. However, weight
misalignment and complex gradient dynamics make it challenging to adopt SVD
prior to the LoRA MoE architecture. To mitigate these issues, we propose
Great LoRA Mixture-of-Expert
(GOAT), a framework that (1) adaptively integrates relevant priors using an
SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by
deriving a theoretical scaling factor. We demonstrate that proper scaling,
without modifying the architecture or training algorithms, boosts LoRA MoE's
efficiency and performance. Experiments across 25 datasets, including natural
language understanding, commonsense reasoning, image classification, and
natural language generation, demonstrate GOAT's state-of-the-art performance,
closing the gap with Full FT.Summary
AI-Generated Summary