MoA：用於大型語言模型參數高效微調的異構適配器混合策略

摘要

近期研究整合了低秩適應（LoRA）與專家混合（MoE）技術，以進一步提升大型語言模型（LLM）應用中參數高效微調（PEFT）方法的性能。現有方法採用同質的MoE-LoRA架構，這些架構由結構與能力相似或相同的LoRA專家組成。然而，這些方法常遭遇表徵崩潰與專家負載不均的問題，對LLM的潛力造成負面影響。為應對這些挑戰，我們提出了一種異質的適配器混合（MoA）方法。該方法動態整合具有多樣結構的PEFT適配器專家，利用其互補的表徵能力促進專家專業化，從而增強預訓練知識向下游任務的有效遷移。MoA支持兩種變體：（i）軟性MoA通過對所有專家輸出進行加權融合實現細粒度整合；（ii）稀疏MoA根據專家的貢獻稀疏激活適配器專家，在幾乎不影響性能的情況下實現此目標。實驗結果表明，異質MoA在性能與參數效率上均優於同質的MoE-LoRA方法。我們的項目已發佈於https://github.com/DCDmllm/MoA。

English

Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ homogeneous MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance, which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: (i) Soft MoA achieves fine-grained integration by performing a weighted fusion of all expert outputs; (ii) Sparse MoA activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.

MoA：用於大型語言模型參數高效微調的異構適配器混合策略

MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models

摘要

Support