ChatPaper.aiChatPaper

思维融合:学习汇聚专家所想,而不仅仅是其所述

Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say

September 25, 2025
作者: Jacob Fein-Ashley, Dhruv Parikh, Rajgopal Kannan, Viktor Prasanna
cs.AI

摘要

开源大型语言模型(LLMs)正日益向特定领域(如数学、编程、通用推理)专业化发展,这促使了利用各模型互补优势的系统设计。以往的多LLM方法要么(i)将查询路由至一个或少数专家模型并独立生成结果,要么(ii)通过成本高昂的多轮交流聚合各模型输出,或者(iii)将权重融合进单一模型——通常要求架构同质性。我们提出了“思维混合”(Mixture of Thoughts, MoT),这是一种在全局路由机制下实现异构专家间潜在层面协作的简便方法。针对每个查询,一个轻量级路由器选出前K个专家并指定一个主专家;均匀分布的交互层将隐藏状态投射至共享潜在空间,在此主专家对其活跃(被选中)的同伴执行交叉注意力。预训练专家模型保持冻结状态;仅路由器和轻量级交互层通过新颖的联合训练目标进行训练,该目标同时优化专家选择与专家间协作。在五个分布内(ID)和三个分布外(OOD)基准测试中,MoT分别以+0.38%和+2.92%的优势超越了当前基于路由和聚合的最先进方法Avengers。此外,MoT显著优于表现最佳的单模型。它通过单次推理实现这一成就,运行时间与路由基线相当,且无需迭代聚合的额外开销。MoT提供了一种在潜在空间内结合异构LLMs的简单机制,是迈向更广泛多LLM协作的实用一步。我们的代码已公开于https://github.com/jacobfa/mot。
English
Open-source Large Language Models (LLMs) increasingly specialize by domain (e.g., math, code, general reasoning), motivating systems that leverage complementary strengths across models. Prior multi-LLM approaches either (i) route a query to one or a few experts and generate independently, (ii) aggregate outputs from each model via costly multi-turn exchanges, or (iii) fuse weights into a single model-typically requiring architectural homogeneity. We introduce Mixture of Thoughts (MoT), a simple method for latent-level collaboration among heterogeneous experts under a global routing scheme. For each query, a lightweight router selects top-K experts and designates a primary expert; uniformly placed interaction layers project hidden states into a shared latent space where the primary expert performs cross-attention over its active (selected) peers. Pre-trained experts remain frozen; only the router and the lightweight interaction layers are trained with a novel joint training objective that improves both the expert selection and inter-expert collaboration. Across five in-distribution (ID) and three out-of-distribution (OOD) benchmarks, MoT surpasses the current routing and aggregation-based state-of-the-art, Avengers, by +0.38% and +2.92%, respectively. Further, MoT significantly outperforms the best-performing single model. It achieves this with single-pass inference, runtime comparable to routing baselines, and none of the overheads of iterative aggregation. MoT offers a simple latent-space mechanism for combining heterogeneous LLMs, a practical step toward broader multi-LLM collaboration. Our code is publicly available at https://github.com/jacobfa/mot.
PDF52September 26, 2025