ChatPaper.aiChatPaper

RouteMoA:无需预推断的动态路由机制显著提升高效混合智能体系统性能

RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents

January 26, 2026
作者: Jize Wang, Han Wu, Zhiyuan You, Yiming Song, Yijun Wang, Zifei Shan, Yining Li, Songyang Zhang, Xinyi Le, Cailian Chen, Xinping Guan, Dacheng Tao
cs.AI

摘要

混合智能体(MoA)通过分层协作提升大语言模型性能,但其密集拓扑结构会显著增加成本与延迟。现有方法采用LLM评委机制筛选响应,但仍需所有模型先完成推理再进行评判,无法有效削减成本。这些方法缺乏模型选择标准,且难以应对大规模模型池场景——全量推理成本高昂且可能超出上下文限制。为此,我们提出RouteMoA:一种具备动态路由的高效混合智能体框架。该框架采用轻量级评分器通过查询预测粗粒度性能进行初筛,无需推理即可将候选模型缩小至高潜力子集;随后通过混合评委机制,基于现有模型输出进行轻量级自评估与交叉评估,在不增加推理负担的情况下实现后验校正;最终通过平衡性能、成本与延迟的模型排序机制完成优选。RouteMoA在不同任务规模与模型池容量下均优于传统MoA,在大规模模型池中可实现89.8%的成本降低与63.6%的延迟优化。
English
Mixture-of-Agents (MoA) improves LLM performance through layered collaboration, but its dense topology raises costs and latency. Existing methods employ LLM judges to filter responses, yet still require all models to perform inference before judging, failing to cut costs effectively. They also lack model selection criteria and struggle with large model pools, where full inference is costly and can exceed context limits. To address this, we propose RouteMoA, an efficient mixture-of-agents framework with dynamic routing. It employs a lightweight scorer to perform initial screening by predicting coarse-grained performance from the query, narrowing candidates to a high-potential subset without inference. A mixture of judges then refines these scores through lightweight self- and cross-assessment based on existing model outputs, providing posterior correction without additional inference. Finally, a model ranking mechanism selects models by balancing performance, cost, and latency. RouteMoA outperforms MoA across varying tasks and model pool sizes, reducing cost by 89.8% and latency by 63.6% in the large-scale model pool.
PDF11January 28, 2026