シンボリック・ミクスチャー・オブ・エキスパーツ：異種推論のための適応的スキルベースルーティング

要旨

既存の事前学習済み専門LLMを組み合わせることは、大規模で多様なタスクにスケーラブルに対処するための有望なアプローチです。しかし、タスクレベルで専門家を選択することはしばしば粗粒度であり、異種のタスクでは各インスタンスごとに異なる専門知識が必要となる場合があります。事前学習済みLLM専門家の適応的なインスタンスレベルの混合を可能にするため、我々はSymbolic-MoEを提案します。これは、記号的、テキストベース、勾配不要のMixture-of-Expertsフレームワークです。Symbolic-MoEは、数学における代数や生物医学的推論における分子生物学などのスキルに重点を置くことで、細粒度の選択アプローチを採用します。我々は、多様な推論タスクに対して、各専門家の強みに基づいて最も関連性の高い専門家LLMのセットを動的に選択するスキルベースのリクルーティング戦略を提案します。選択された各専門家は独自の推論を生成し、k人の専門家からk個の出力が得られます。これらは、多様な推論出力を統合する能力に基づいて選択されたアグリゲーターによって、最終的な高品質な応答に統合されます。Symbolic-MoEのインスタンスレベルでの専門家選択は、大幅な性能向上をもたらしますが、単純に実装すると、モデルの頻繁なロードとアンロードが必要となるため、高い計算オーバーヘッドが生じる可能性があります。これを解決するため、我々は、割り当てられた専門家に基づいてインスタンスをグループ化し、各モデルを一度だけロードするバッチ推論戦略を実装しました。これにより、1つのGPU上で16の専門家モデルを統合し、4つのGPUを使用する従来のマルチエージェントベースラインと同等またはそれ以上の時間コストを実現しました。多様なベンチマーク（MMLU-Pro、GPQA、AIME、MedMCQA）での広範な評価を通じて、Symbolic-MoEがGPT4o-miniのような強力なLLMやマルチエージェントアプローチを上回り、最良のマルチエージェントベースラインに対して平均8.15%の絶対的な改善を示すことを実証しました。さらに、Symbolic-MoEは高コストな多ラウンドの議論を不要とし、より少ない計算量で議論ベースラインを上回ります。

English

Combining existing pre-trained expert LLMs is a promising avenue for scalably tackling large-scale and diverse tasks. However, selecting experts at the task level is often too coarse-grained, as heterogeneous tasks may require different expertise for each instance. To enable adaptive instance-level mixing of pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained approach to selection by emphasizing skills, e.g., algebra in math or molecular biology in biomedical reasoning. We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths. Each selected expert then generates its own reasoning, resulting in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator chosen based on its ability to integrate diverse reasoning outputs. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch inference strategy that groups instances based on their assigned experts, loading each model only once. This allows us to integrate 16 expert models on 1 GPU with a time cost comparable to or better than prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we demonstrate that Symbolic-MoE outperforms strong LLMs like GPT4o-mini, as well as multi-agent approaches, with an absolute average improvement of 8.15% over the best multi-agent baseline. Moreover, Symbolic-MoE removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.

シンボリック・ミクスチャー・オブ・エキスパーツ：異種推論のための適応的スキルベースルーティング

Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

要旨

Support