MoA：面向大语言模型参数高效微调的异构适配器混合架构

摘要

近期研究将低秩适应（LoRA）与专家混合（MoE）相结合，进一步提升了大语言模型（LLM）应用中参数高效微调（PEFT）方法的性能。现有方法采用同质化的MoE-LoRA架构，其专家模块具有相似或相同的结构与容量。然而，这些方法常面临表示坍塌与专家负载不均的问题，制约了LLM的潜力发挥。针对这些挑战，我们提出了一种异构适配器混合（MoA）方法。该方法动态整合具有多样结构的PEFT适配器专家，利用其互补的表示能力促进专家专业化，从而增强预训练知识向下游任务的有效迁移。MoA支持两种变体：（i）软MoA通过对所有专家输出进行加权融合实现细粒度整合；（ii）稀疏MoA根据贡献度稀疏激活适配器专家，在几乎不损失性能的前提下达成目标。实验结果表明，异构MoA在性能与参数效率上均优于同质化MoE-LoRA方法。我们的项目已发布于https://github.com/DCDmllm/MoA。

English

Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ homogeneous MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance, which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: (i) Soft MoA achieves fine-grained integration by performing a weighted fusion of all expert outputs; (ii) Sparse MoA activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.

MoA：面向大语言模型参数高效微调的异构适配器混合架构

MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models

摘要

Support