IR3DE：大型語言模型的線性路由器

摘要

基础大语言模型（LLMs）在广泛的一般任务上展现出熟练的能力，并通过领域专家LLMs在各种专业任务上取得了显著成果。随着可用LLM列表的不断增长，推理路由器被提出用于为每个提示选择最合适的LLM。然而，现有的路由方法要么在从弱到强的通用LLMs之间优化成本，要么需要大量训练来支持领域专家路由。在本文中，我们提出了IR3DE，一种基于岭回归的领域专家路由器，能够为每个提示提供低成本且快速的路由决策。我们在两种因果语言建模（CLM）设置下评估了IR3DE，其中任务是对所有域进行下一个词预测，以及一种推理设置，其中每个域都有其独特的推理任务。尽管是一个线性路由器，IR3DE在两种CLM设置中实现了与其他基线相当的性能，并在推理设置中超越了它们，归一化性能达到98.4%。此外，IR3DE支持添加或移除新的领域专家，而无需从头重新训练路由器，从而允许以最小中断路由器本身的方式服务一组动态的LLM。我们的代码可在 github.com/gensyn-ai/IR3DE 获取。

English

Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong generalist LLMs or require substantial training to support domain-expertise routing. In this paper, we propose IR3DE, a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. We evaluate IR3DE in two Causal Language Modeling (CLM) settings where the tasks are next-token prediction for all domains, and one reasoning setting where each domain has its own distinct reasoning task. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself. Our code is available at: github.com/gensyn-ai/IR3DE.