自我MoE:朝向具有自我专业化专家的组合大型语言模型
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
June 17, 2024
作者: Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter
cs.AI
摘要
我们提出了Self-MoE,这是一种将单片LMM转变为自专家化模块化系统的方法,称为MiXSE(自专家化专家混合)。我们的方法利用自专家化,使用自动生成的合成数据构建专家模块,每个模块都配备有共享的基础LMM,并且包含自优化路由。这使得能够动态地、针对不同目标任务进行能力特定处理,增强整体能力,而无需大量人工标记的数据和额外的参数。我们的实证结果显示,专门化LMM可能在非专门化任务的性能上存在潜在的权衡。另一方面,我们的Self-MoE在各种基准测试中均显著优于基础LMM,如知识、推理、数学和编码。它还在设计上通过语义专家和路由提供了更好的灵活性和可解释性,始终优于其他方法,包括实例合并和权重合并。我们的发现突显了模块化的关键作用以及自我改进在实现高效、可扩展和适应性系统方面的潜力。
English
We present Self-MoE, an approach that transforms a monolithic LLM into a
compositional, modular system of self-specialized experts, named MiXSE (MiXture
of Self-specialized Experts). Our approach leverages self-specialization, which
constructs expert modules using self-generated synthetic data, each equipped
with a shared base LLM and incorporating self-optimized routing. This allows
for dynamic and capability-specific handling of various target tasks, enhancing
overall capabilities, without extensive human-labeled data and added
parameters. Our empirical results reveal that specializing LLMs may exhibit
potential trade-offs in performances on non-specialized tasks. On the other
hand, our Self-MoE demonstrates substantial improvements over the base LLM
across diverse benchmarks such as knowledge, reasoning, math, and coding. It
also consistently outperforms other methods, including instance merging and
weight merging, while offering better flexibility and interpretability by
design with semantic experts and routing. Our findings highlight the critical
role of modularity and the potential of self-improvement in achieving
efficient, scalable, and adaptable systems.Summary
AI-Generated Summary