사고의 혼합: 전문가들이 말하는 것뿐만 아니라 그들이 생각하는 것을 집계하는 법 학습

초록

오픈소스 대형 언어 모델(LLMs)은 점차 특정 도메인(예: 수학, 코드, 일반 추론)에 특화되면서, 여러 모델의 상호 보완적 강점을 활용하는 시스템에 대한 필요성이 대두되고 있다. 기존의 다중 LLM 접근 방식은 (i) 쿼리를 하나 또는 소수의 전문가 모델로 라우팅하여 독립적으로 생성하거나, (ii) 비용이 많이 드는 다중 턴 교환을 통해 각 모델의 출력을 통합하거나, (iii) 가중치를 단일 모델로 융합하는 방식으로, 일반적으로 아키텍처의 동질성을 요구한다. 본 연구에서는 이질적인 전문가 모델 간의 잠재 수준 협업을 위한 간단한 방법인 Mixture of Thoughts (MoT)를 소개한다. 각 쿼리에 대해 경량 라우터가 상위 K개의 전문가 모델을 선택하고 주 전문가를 지정하며, 균일하게 배치된 상호작용 계층은 은닉 상태를 공유 잠재 공간으로 투영하여 주 전문가가 선택된 동료 모델들에 대해 교차 주의(cross-attention)를 수행한다. 사전 훈련된 전문가 모델은 고정된 상태로 유지되며, 라우터와 경량 상호작용 계층만이 전문가 선택과 전문가 간 협업을 모두 개선하는 새로운 공동 훈련 목표를 통해 학습된다. 5개의 인-분포(In-Distribution, ID) 벤치마크와 3개의 외-분포(Out-of-Distribution, OOD) 벤치마크에서 MoT는 현재 최신 기술인 Avengers를 각각 +0.38% 및 +2.92%로 능가한다. 또한, MoT는 단일 모델 중 최고 성능을 크게 뛰어넘는다. 이는 단일 패스 추론, 라우팅 기준선과 유사한 런타임, 그리고 반복적 통합의 오버헤드 없이 달성된다. MoT는 이질적인 LLM을 결합하기 위한 간단한 잠재 공간 메커니즘을 제공하며, 더 넓은 다중 LLM 협업을 위한 실질적인 단계를 제시한다. 본 연구의 코드는 https://github.com/jacobfa/mot에서 공개되어 있다.

English

Open-source Large Language Models (LLMs) increasingly specialize by domain (e.g., math, code, general reasoning), motivating systems that leverage complementary strengths across models. Prior multi-LLM approaches either (i) route a query to one or a few experts and generate independently, (ii) aggregate outputs from each model via costly multi-turn exchanges, or (iii) fuse weights into a single model-typically requiring architectural homogeneity. We introduce Mixture of Thoughts (MoT), a simple method for latent-level collaboration among heterogeneous experts under a global routing scheme. For each query, a lightweight router selects top-K experts and designates a primary expert; uniformly placed interaction layers project hidden states into a shared latent space where the primary expert performs cross-attention over its active (selected) peers. Pre-trained experts remain frozen; only the router and the lightweight interaction layers are trained with a novel joint training objective that improves both the expert selection and inter-expert collaboration. Across five in-distribution (ID) and three out-of-distribution (OOD) benchmarks, MoT surpasses the current routing and aggregation-based state-of-the-art, Avengers, by +0.38% and +2.92%, respectively. Further, MoT significantly outperforms the best-performing single model. It achieves this with single-pass inference, runtime comparable to routing baselines, and none of the overheads of iterative aggregation. MoT offers a simple latent-space mechanism for combining heterogeneous LLMs, a practical step toward broader multi-LLM collaboration. Our code is publicly available at https://github.com/jacobfa/mot.

사고의 혼합: 전문가들이 말하는 것뿐만 아니라 그들이 생각하는 것을 집계하는 법 학습

Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say

초록

Support