基于细粒度潜在任务发现的可扩展提示路由方法
Scalable Prompt Routing via Fine-Grained Latent Task Discovery
March 19, 2026
作者: Yunyi Zhang, Soji Adeshina, Patrick Guan, Ashwin Ganesh, Zhen Han, Vassilis N. Ioannidis, Huzefa Rangwala, George Karypis
cs.AI
摘要
提示路由技术能够动态地从候选模型池中为每个查询选择最合适的大语言模型,在优化性能的同时有效控制成本。随着模型池规模扩大至包含数十个性能差距微弱的前沿模型,现有方法面临重大挑战:手动定义的任务分类法难以捕捉细粒度能力差异,而单一路由器无法有效区分多样化任务间的细微差别。我们提出一种两阶段路由架构,通过自动化细粒度任务发现和任务感知质量评估来解决这些局限。第一阶段采用基于图结构的聚类方法发现潜在任务类型,并训练分类器将提示分配至已发现任务。第二阶段使用混合专家架构,配备针对特定任务的预测头以实现专业化质量评估。在推理时,我们综合两个阶段的预测结果,以平衡任务级稳定性与提示级适应性。在包含11个前沿模型的10个基准测试中,我们的方法始终优于现有基线,其表现超越最强单体模型的同时,成本不足后者的一半。
English
Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow performance gaps, existing approaches face significant challenges: manually defined task taxonomies cannot capture fine-grained capability distinctions, while monolithic routers struggle to differentiate subtle differences across diverse tasks. We propose a two-stage routing architecture that addresses these limitations through automated fine-grained task discovery and task-aware quality estimation. Our first stage employs graph-based clustering to discover latent task types and trains a classifier to assign prompts to discovered tasks. The second stage uses a mixture-of-experts architecture with task-specific prediction heads for specialized quality estimates. At inference, we aggregate predictions from both stages to balance task-level stability with prompt-specific adaptability. Evaluated on 10 benchmarks with 11 frontier models, our method consistently outperforms existing baselines and surpasses the strongest individual model while incurring less than half its cost.