符號專家混合模型:面向異質推理的自適應技能導向路由
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning
March 7, 2025
作者: Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal
cs.AI
摘要
結合現有的預訓練專家大型語言模型(LLMs)是應對大規模且多樣化任務的一條極具前景的途徑。然而,在任務層面選擇專家往往過於粗粒度,因為異質性任務可能對每個實例需要不同的專業知識。為了實現預訓練LLM專家的自適應實例級混合,我們提出了Symbolic-MoE,這是一個符號化、基於文本且無梯度的專家混合框架。Symbolic-MoE採取細粒度選擇方法,強調技能,例如數學中的代數或生物醫學推理中的分子生物學。我們提出了一種基於技能的招募策略,根據專家的優勢動態選擇最相關的專家LLMs集合來處理多樣化的推理任務。每個選定的專家隨後生成其自身的推理,從而產生來自k個專家的k個輸出,這些輸出隨後由一個基於其整合多樣推理輸出能力選擇的聚合器合成為最終的高質量響應。我們展示了Symbolic-MoE的實例級專家選擇大幅提升了性能,但若簡單實現,可能會因需要不斷加載和卸載模型而引入高計算開銷。為解決這一問題,我們實施了一種批量推理策略,根據分配的專家對實例進行分組,每個模型僅加載一次。這使得我們能夠在1個GPU上集成16個專家模型,其時間成本與之前使用4個GPU的多代理基線相當或更優。通過在多樣化基準(MMLU-Pro、GPQA、AIME和MedMCQA)上的廣泛評估,我們證明Symbolic-MoE優於如GPT4o-mini等強力LLMs以及多代理方法,相對於最佳多代理基線的絕對平均提升達8.15%。此外,Symbolic-MoE消除了對昂貴多輪討論的需求,以更少的計算量超越了討論基線。
English
Combining existing pre-trained expert LLMs is a promising avenue for scalably
tackling large-scale and diverse tasks. However, selecting experts at the task
level is often too coarse-grained, as heterogeneous tasks may require different
expertise for each instance. To enable adaptive instance-level mixing of
pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and
gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained
approach to selection by emphasizing skills, e.g., algebra in math or molecular
biology in biomedical reasoning. We propose a skill-based recruiting strategy
that dynamically selects the most relevant set of expert LLMs for diverse
reasoning tasks based on their strengths. Each selected expert then generates
its own reasoning, resulting in k outputs from k experts, which are then
synthesized into a final high-quality response by an aggregator chosen based on
its ability to integrate diverse reasoning outputs. We show that Symbolic-MoE's
instance-level expert selection improves performance by a large margin but --
when implemented naively -- can introduce a high computational overhead due to
the need for constant model loading and offloading. To address this, we
implement a batch inference strategy that groups instances based on their
assigned experts, loading each model only once. This allows us to integrate 16
expert models on 1 GPU with a time cost comparable to or better than prior
multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse
benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we demonstrate that
Symbolic-MoE outperforms strong LLMs like GPT4o-mini, as well as multi-agent
approaches, with an absolute average improvement of 8.15% over the best
multi-agent baseline. Moreover, Symbolic-MoE removes the need for expensive
multi-round discussions, outperforming discussion baselines with less
computation.Summary
AI-Generated Summary