細粒度潜在タスク発見によるスケーラブルなプロンプトルーティング

要旨

プロンプトルーティングは、クエリごとに候補モデル群から最適な大規模言語モデルを動的に選択し、コストを管理しながら性能を最適化する技術である。モデル群が数十のフロンティアモデルに拡大し、性能差が狭まってくると、既存手法は重大な課題に直面する。手動定義のタスク分類では細かな能力差を捉えられず、単一のルーターでは多様なタスク間の微妙な差異を識別できない。本論文では、自動的な細粒度タスク発見とタスク考慮型品質推定による二段階ルーティングアーキテクチャを提案する。第一段階ではグラフベースクラスタリングにより潜在タスクを発見し、分類器を訓練してプロンプトをタスク割り当てする。第二段階では、タスク専門の予測ヘッドを持つ専門家混合アーキテクチャにより、特化した品質推定を行う。推論時には両段階の予測を統合し、タスクレベルの安定性とプロンプト固有の適応性のバランスを実現する。11のフロンティアモデルと10のベンチマークによる評価では、本手法は既存手法を一貫して上回り、最強の単一モデルを性能で凌駕しながら、その半額以下のコストで動作した。

English

Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow performance gaps, existing approaches face significant challenges: manually defined task taxonomies cannot capture fine-grained capability distinctions, while monolithic routers struggle to differentiate subtle differences across diverse tasks. We propose a two-stage routing architecture that addresses these limitations through automated fine-grained task discovery and task-aware quality estimation. Our first stage employs graph-based clustering to discover latent task types and trains a classifier to assign prompts to discovered tasks. The second stage uses a mixture-of-experts architecture with task-specific prediction heads for specialized quality estimates. At inference, we aggregate predictions from both stages to balance task-level stability with prompt-specific adaptability. Evaluated on 10 benchmarks with 11 frontier models, our method consistently outperforms existing baselines and surpasses the strongest individual model while incurring less than half its cost.

細粒度潜在タスク発見によるスケーラブルなプロンプトルーティング

Scalable Prompt Routing via Fine-Grained Latent Task Discovery

要旨

Support