讓LoRA再次偉大:透過自適應奇異值與專家混合優化對齊來提升LoRA效能
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
February 24, 2025
作者: Chenghao Fan, Zhenyi Lu, Sichen Liu, Xiaoye Qu, Wei Wei, Chengfeng Gu, Yu Cheng
cs.AI
摘要
儘管低秩適應(LoRA)實現了大型語言模型(LLMs)的參數高效微調,但其性能往往不及全量微調(Full FT)。現有方法通過初始化靜態奇異值分解(SVD)子集來優化LoRA,導致對預訓練知識的利用未達最佳。另一條提升LoRA的途徑是引入專家混合(MoE)架構。然而,權重不對齊和複雜的梯度動態使得在LoRA MoE架構之前採用SVD面臨挑戰。為解決這些問題,我們提出了GOAT(Great LoRA Mixture-of-Expert)框架,該框架(1)利用SVD結構化的MoE自適應地整合相關先驗知識,(2)通過推導理論縮放因子,使優化與全量微調的MoE對齊。我們證明,在不改變架構或訓練算法的情況下,適當的縮放顯著提升了LoRA MoE的效率和性能。在包括自然語言理解、常識推理、圖像分類和自然語言生成在內的25個數據集上的實驗表明,GOAT實現了頂尖性能,縮小了與Full FT的差距。
English
While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for
Large Language Models (LLMs), its performance often falls short of Full
Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with
static singular value decomposition (SVD) subsets, leading to suboptimal
leveraging of pre-trained knowledge. Another path for improving LoRA is
incorporating a Mixture-of-Experts (MoE) architecture. However, weight
misalignment and complex gradient dynamics make it challenging to adopt SVD
prior to the LoRA MoE architecture. To mitigate these issues, we propose
Great LoRA Mixture-of-Expert
(GOAT), a framework that (1) adaptively integrates relevant priors using an
SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by
deriving a theoretical scaling factor. We demonstrate that proper scaling,
without modifying the architecture or training algorithms, boosts LoRA MoE's
efficiency and performance. Experiments across 25 datasets, including natural
language understanding, commonsense reasoning, image classification, and
natural language generation, demonstrate GOAT's state-of-the-art performance,
closing the gap with Full FT.