RouteMoA: Dynamisch Routeren zonder Pre-Inferentie Verbetert Efficiënte Mixture-of-Agents

Samenvatting

RouteMoA verbetert de efficiëntie van Mixture-of-Agents (MoA) door dynamisch routeren. Het gebruikt een lichtgewicht scorer voor een initiële screening door prestaties op hoofdlijnen te voorspellen vanuit de query, waardoor kandidaten worden teruggebracht tot een hoogpotente subset zonder inference. Een mix van judges verfijnt deze scores vervolgens via lichtgewicht zelf- en kruisbeoordeling op basis van bestaande modeloutputs, wat een posterior correctie biedt zonder extra inference. Ten slotte selecteert een modelrankingmechanisme modellen door prestaties, kosten en latentie af te wegen. RouteMoA overtreft MoA bij verschillende taken en modelpoolgroottes, en reduceert kosten met 89,8% en latentie met 63,6% in de grootschalige modelpool.

English

Mixture-of-Agents (MoA) improves LLM performance through layered collaboration, but its dense topology raises costs and latency. Existing methods employ LLM judges to filter responses, yet still require all models to perform inference before judging, failing to cut costs effectively. They also lack model selection criteria and struggle with large model pools, where full inference is costly and can exceed context limits. To address this, we propose RouteMoA, an efficient mixture-of-agents framework with dynamic routing. It employs a lightweight scorer to perform initial screening by predicting coarse-grained performance from the query, narrowing candidates to a high-potential subset without inference. A mixture of judges then refines these scores through lightweight self- and cross-assessment based on existing model outputs, providing posterior correction without additional inference. Finally, a model ranking mechanism selects models by balancing performance, cost, and latency. RouteMoA outperforms MoA across varying tasks and model pool sizes, reducing cost by 89.8% and latency by 63.6% in the large-scale model pool.

RouteMoA: Dynamisch Routeren zonder Pre-Inferentie Verbetert Efficiënte Mixture-of-Agents

RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents

Samenvatting

Support