ChatPaper.aiChatPaper

IR3DE:大型語言模型的線性路由器

IR3DE: A Linear Router for Large Language Models

June 4, 2026
作者: Eros Fanì, Oğuzhan Ersoy
cs.AI

摘要

基础大语言模型(LLMs)在广泛的一般任务上展现出熟练的能力,并通过领域专家LLMs在各种专业任务上取得了显著成果。随着可用LLM列表的不断增长,推理路由器被提出用于为每个提示选择最合适的LLM。然而,现有的路由方法要么在从弱到强的通用LLMs之间优化成本,要么需要大量训练来支持领域专家路由。在本文中,我们提出了IR3DE,一种基于岭回归的领域专家路由器,能够为每个提示提供低成本且快速的路由决策。我们在两种因果语言建模(CLM)设置下评估了IR3DE,其中任务是对所有域进行下一个词预测,以及一种推理设置,其中每个域都有其独特的推理任务。尽管是一个线性路由器,IR3DE在两种CLM设置中实现了与其他基线相当的性能,并在推理设置中超越了它们,归一化性能达到98.4%。此外,IR3DE支持添加或移除新的领域专家,而无需从头重新训练路由器,从而允许以最小中断路由器本身的方式服务一组动态的LLM。我们的代码可在 github.com/gensyn-ai/IR3DE 获取。
English
Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong generalist LLMs or require substantial training to support domain-expertise routing. In this paper, we propose IR3DE, a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. We evaluate IR3DE in two Causal Language Modeling (CLM) settings where the tasks are next-token prediction for all domains, and one reasoning setting where each domain has its own distinct reasoning task. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself. Our code is available at: github.com/gensyn-ai/IR3DE.