IR3DE：一种面向大型语言模型的线性路由器

摘要

基础大语言模型（LLMs）在广泛通用任务中展现出卓越能力，并通过领域专家LLM在各种专业任务中取得了显著成果。随着可用LLM数量持续增长，推理路由器被提出以针对每个提示选择最合适的LLM。然而，现有路由方法要么在弱到强通用LLM之间优化成本，要么需要大量训练以支持领域专家路由。本文提出IR3DE——一种基于岭回归的领域专家路由器，能够为每个提示提供低成本、快速的路由决策。我们在两种因果语言建模（CLM）设置下评估IR3DE，其中所有领域的任务均为下一词预测；并在一种推理设置下进行评估，其中每个领域拥有其独特的推理任务。尽管是线性路由器，IR3DE在两种CLM设置中均达到与其他基线相当的性能，在推理设置中更胜一筹，归一化性能达98.4%。此外，IR3DE支持无需从头重新训练路由器即可添加或移除新的领域专家，从而能以最小化对路由器本身的影响服务动态LLM集合。我们的代码开源在：github.com/gensyn-ai/IR3DE。

English

Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong generalist LLMs or require substantial training to support domain-expertise routing. In this paper, we propose IR3DE, a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. We evaluate IR3DE in two Causal Language Modeling (CLM) settings where the tasks are next-token prediction for all domains, and one reasoning setting where each domain has its own distinct reasoning task. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself. Our code is available at: github.com/gensyn-ai/IR3DE.