ChatPaper.aiChatPaper

ToolOrchestra:透過高效模型與工具協作提升智能水平

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

November 26, 2025
作者: Hongjin Su, Shizhe Diao, Ximing Lu, Mingjie Liu, Jiacheng Xu, Xin Dong, Yonggan Fu, Peter Belcak, Hanrong Ye, Hongxu Yin, Yi Dong, Evelina Bakhturina, Tao Yu, Yejin Choi, Jan Kautz, Pavlo Molchanov
cs.AI

摘要

大型語言模型雖是強大的通用系統,但在解決如「人類終極考試」(HLE)這類深層複雜問題時,仍面臨概念挑戰與計算成本高昂的雙重難題。我們的研究表明,通過小型協調器管理其他模型與多樣化工具,既能突破智能上限,又能提升解決複雜代理任務的效率。本文提出ToolOrchestra——一種訓練小型協調器以統籌智能工具的方法。該方法顯式運用強化學習,融合結果導向、效率優化及用戶偏好感知的獎勵機制。基於此,我們訓練出Orchestrator模型(參數量80億),其相較過往工具調用智能體,能以更低成本實現更高準確率,並能根據用戶偏好匹配任務與工具。在HLE測試中,Orchestrator獲得37.1%的得分,超越GPT-5(35.1%)且效率提升2.5倍;在tau2-Bench與FRAMES基準上,其以僅30%的成本大幅領先GPT-5。深入分析顯示,Orchestrator在多項指標下實現性能與成本的最佳權衡,並對未見過工具展現強健泛化能力。這些成果證明,通過輕量級協調模型組合多樣化工具,相較現有方法兼具更高效率與效能,為實用可擴展的工具增強推理系統開闢了新路徑。
English
Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the upper bound of intelligence and improve efficiency in solving difficult agentic tasks. We introduce ToolOrchestra, a method for training small orchestrators that coordinate intelligent tools. ToolOrchestra explicitly uses reinforcement learning with outcome-, efficiency-, and user-preference-aware rewards. Using ToolOrchestra, we produce Orchestrator, an 8B model that achieves higher accuracy at lower cost than previous tool-use agents while aligning with user preferences on which tools are to be used for a given query. On HLE, Orchestrator achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being 2.5x more efficient. On tau2-Bench and FRAMES, Orchestrator surpasses GPT-5 by a wide margin while using only about 30% of the cost. Extensive analysis shows that Orchestrator achieves the best trade-off between performance and cost under multiple metrics, and generalizes robustly to unseen tools. These results demonstrate that composing diverse tools with a lightweight orchestration model is both more efficient and more effective than existing methods, paving the way for practical and scalable tool-augmented reasoning systems.
PDF551December 4, 2025