思考の融合：専門家が語る内容だけでなく、彼らが考えることを集約する学習

要旨

オープンソースの大規模言語モデル（LLM）は、数学、コード、一般的な推論などの分野ごとに専門化が進んでおり、複数のモデルの補完的な強みを活用するシステムの必要性が高まっています。これまでのマルチLLMアプローチは、(i) クエリを1つまたは少数の専門家モデルにルーティングして独立して生成する、(ii) 高コストなマルチターン交換を通じて各モデルの出力を集約する、または(iii) 重みを単一のモデルに融合する（通常、アーキテクチャの均一性を必要とする）のいずれかでした。本研究では、異種の専門家モデル間で潜在レベルでの協力を実現するシンプルな手法である「Mixture of Thoughts（MoT）」を提案します。各クエリに対して、軽量なルーターがトップKの専門家モデルを選択し、主要な専門家モデルを指定します。均一に配置されたインタラクションレイヤーは、隠れ状態を共有の潜在空間に投影し、主要な専門家モデルが選択されたピアモデルに対してクロスアテンションを実行します。事前学習済みの専門家モデルは凍結され、ルーターと軽量なインタラクションレイヤーのみが、専門家モデルの選択とモデル間の協力を改善する新しい共同学習目標で訓練されます。5つのインディストリビューション（ID）ベンチマークと3つのアウトオブディストリビューション（OOD）ベンチマークにおいて、MoTは現在のルーティングおよび集約ベースの最先端手法であるAvengersをそれぞれ+0.38％および+2.92％上回りました。さらに、MoTは単一の最良モデルを大幅に上回る性能を発揮します。これは、シングルパス推論、ルーティングベースラインと同等の実行時間、および反復的な集約のオーバーヘッドなしで達成されます。MoTは、異種LLMを組み合わせるためのシンプルな潜在空間メカニズムを提供し、より広範なマルチLLM協力への実践的な一歩となります。コードはhttps://github.com/jacobfa/motで公開されています。

English

Open-source Large Language Models (LLMs) increasingly specialize by domain (e.g., math, code, general reasoning), motivating systems that leverage complementary strengths across models. Prior multi-LLM approaches either (i) route a query to one or a few experts and generate independently, (ii) aggregate outputs from each model via costly multi-turn exchanges, or (iii) fuse weights into a single model-typically requiring architectural homogeneity. We introduce Mixture of Thoughts (MoT), a simple method for latent-level collaboration among heterogeneous experts under a global routing scheme. For each query, a lightweight router selects top-K experts and designates a primary expert; uniformly placed interaction layers project hidden states into a shared latent space where the primary expert performs cross-attention over its active (selected) peers. Pre-trained experts remain frozen; only the router and the lightweight interaction layers are trained with a novel joint training objective that improves both the expert selection and inter-expert collaboration. Across five in-distribution (ID) and three out-of-distribution (OOD) benchmarks, MoT surpasses the current routing and aggregation-based state-of-the-art, Avengers, by +0.38% and +2.92%, respectively. Further, MoT significantly outperforms the best-performing single model. It achieves this with single-pass inference, runtime comparable to routing baselines, and none of the overheads of iterative aggregation. MoT offers a simple latent-space mechanism for combining heterogeneous LLMs, a practical step toward broader multi-LLM collaboration. Our code is publicly available at https://github.com/jacobfa/mot.

思考の融合：専門家が語る内容だけでなく、彼らが考えることを集約する学習

Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say

要旨

Support