基于细粒度潜在任务发现的可扩展提示路由方法
Scalable Prompt Routing via Fine-Grained Latent Task Discovery
March 19, 2026
作者: Yunyi Zhang, Soji Adeshina, Patrick Guan, Ashwin Ganesh, Zhen Han, Vassilis N. Ioannidis, Huzefa Rangwala, George Karypis
cs.AI
摘要
提示路由技術能夠針對每個查詢從候選模型中動態選擇最合適的大型語言模型,在優化性能的同時有效控制成本。隨著模型池規模擴展至包含數十個性能差距微小的前沿模型,現有方法面臨重大挑戰:手動定義的任務分類法無法捕捉細粒度的能力差異,而單體式路由架構難以區分不同任務間的細微差別。為此,我們提出一種兩階段路由架構,通過自動化細粒度任務發現和任務感知的質量評估來解決這些局限性。第一階段採用基於圖的聚類方法發現潛在任務類型,並訓練分類器將提示分配至已發現的任務;第二階段使用混合專家架構,配備針對特定任務的預測頭以實現專業化質量評估。在推理時,我們整合兩個階段的預測結果,實現任務級穩定性與提示特定適應性之間的平衡。在包含11個前沿模型的10個基準測試中,本方法始終優於現有基準方案,其表現超越最強單體模型的同時,成本不足後者的一半。
English
Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow performance gaps, existing approaches face significant challenges: manually defined task taxonomies cannot capture fine-grained capability distinctions, while monolithic routers struggle to differentiate subtle differences across diverse tasks. We propose a two-stage routing architecture that addresses these limitations through automated fine-grained task discovery and task-aware quality estimation. Our first stage employs graph-based clustering to discover latent task types and trains a classifier to assign prompts to discovered tasks. The second stage uses a mixture-of-experts architecture with task-specific prediction heads for specialized quality estimates. At inference, we aggregate predictions from both stages to balance task-level stability with prompt-specific adaptability. Evaluated on 10 benchmarks with 11 frontier models, our method consistently outperforms existing baselines and surpasses the strongest individual model while incurring less than half its cost.