Self-MoE: 自己専門化エキスパートによる構成可能な大規模言語モデルへ向けて

要旨

本論文では、Self-MoEというアプローチを提案します。これは、単一の大規模言語モデル（LLM）を、自己専門化された専門家群（MiXSE：MiXture of Self-specialized Experts）からなるモジュール型の構成システムへと変換する手法です。本アプローチでは、自己専門化を活用し、自己生成した合成データを用いて専門家モジュールを構築します。各モジュールは共有の基盤LLMを備え、自己最適化されたルーティングを組み込んでいます。これにより、多様なタスクに対して動的かつ能力に応じた処理が可能となり、人間によるラベル付けデータや追加パラメータを必要とせずに、全体的な能力を向上させます。実証結果からは、LLMの専門化が非専門タスクにおける性能にトレードオフをもたらす可能性が示されています。一方で、Self-MoEは、知識、推論、数学、コーディングなど多岐にわたるベンチマークにおいて、基盤LLMを大幅に上回る改善を示しました。また、インスタンスマージや重みマージなどの他の手法を一貫して凌駕し、セマンティックな専門家とルーティングを設計に取り入れることで、柔軟性と解釈可能性も向上させています。本研究の成果は、効率的でスケーラブルかつ適応性の高いシステムを実現する上で、モジュール性と自己改善の可能性が重要な役割を果たすことを強調しています。

English

We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipped with a shared base LLM and incorporating self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.

Self-MoE: 自己専門化エキスパートによる構成可能な大規模言語モデルへ向けて

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

要旨

Support