自我-MoE:朝向具有自我專業專家的組合式大型語言模型
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
June 17, 2024
作者: Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter
cs.AI
摘要
我們提出了Self-MoE,一種將單一的LLM轉換為由自我專業化專家組成的組合式模塊系統MiXSE(Self-specialized Experts混合體)。我們的方法利用自我專業化,使用自生成的合成數據構建專家模塊,每個模塊配備共享的基礎LLM並融入自我優化的路由。這使得能夠動態且能力特定地處理各種目標任務,增強整體能力,而無需大量人工標記的數據和添加參數。我們的實證結果顯示,專業化的LLM在非專業化任務上可能存在性能折衷。另一方面,我們的Self-MoE在各種基準測試中均顯示出明顯的改進,如知識、推理、數學和編碼。它還在設計上通過語義專家和路由提供更好的靈活性和可解釋性,並始終優於其他方法,包括實例合併和權重合併。我們的研究結果突顯了模塊化的關鍵作用以及自我改進在實現高效、可擴展和適應性系統方面的潛力。
English
We present Self-MoE, an approach that transforms a monolithic LLM into a
compositional, modular system of self-specialized experts, named MiXSE (MiXture
of Self-specialized Experts). Our approach leverages self-specialization, which
constructs expert modules using self-generated synthetic data, each equipped
with a shared base LLM and incorporating self-optimized routing. This allows
for dynamic and capability-specific handling of various target tasks, enhancing
overall capabilities, without extensive human-labeled data and added
parameters. Our empirical results reveal that specializing LLMs may exhibit
potential trade-offs in performances on non-specialized tasks. On the other
hand, our Self-MoE demonstrates substantial improvements over the base LLM
across diverse benchmarks such as knowledge, reasoning, math, and coding. It
also consistently outperforms other methods, including instance merging and
weight merging, while offering better flexibility and interpretability by
design with semantic experts and routing. Our findings highlight the critical
role of modularity and the potential of self-improvement in achieving
efficient, scalable, and adaptable systems.Summary
AI-Generated Summary