ChatPaper.aiChatPaper

ToolOrchestra:通过高效模型与工具编排提升智能水平

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

November 26, 2025
作者: Hongjin Su, Shizhe Diao, Ximing Lu, Mingjie Liu, Jiacheng Xu, Xin Dong, Yonggan Fu, Peter Belcak, Hanrong Ye, Hongxu Yin, Yi Dong, Evelina Bakhturina, Tao Yu, Yejin Choi, Jan Kautz, Pavlo Molchanov
cs.AI

摘要

大型语言模型是强大的通用智能体,但在解决诸如"人类终极考试"(HLE)这类深层复杂问题时,仍面临概念性挑战和高昂计算成本。我们证明,通过小型调度器管理其他模型与多样化工具,既能突破智能水平上限,又能提升复杂智能任务的解决效率。我们提出ToolOrchestra方法,专门训练用于协调智能工具的小型调度器。该方法显式运用强化学习,融合结果感知、效率感知和用户偏好的奖励机制。基于ToolOrchestra训练的8B参数调度器Orchestrator,在保持更低成本的同时,其准确率超越以往工具使用智能体,并能根据用户偏好为特定查询分配合适工具。在HLE测试中,Orchestrator以37.1%的得分超越GPT-5(35.1%),效率提升2.5倍;在tau2-Bench和FRAMES基准上,其以仅30%的成本实现显著优势。深入分析表明,Orchestrator在多项指标下实现了性能与成本的最佳平衡,并对未见工具具备强泛化能力。这些结果证明,通过轻量级调度模型组合多样化工具,比现有方法更高效且更有效,为实用化、可扩展的工具增强推理系统开辟了新路径。
English
Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the upper bound of intelligence and improve efficiency in solving difficult agentic tasks. We introduce ToolOrchestra, a method for training small orchestrators that coordinate intelligent tools. ToolOrchestra explicitly uses reinforcement learning with outcome-, efficiency-, and user-preference-aware rewards. Using ToolOrchestra, we produce Orchestrator, an 8B model that achieves higher accuracy at lower cost than previous tool-use agents while aligning with user preferences on which tools are to be used for a given query. On HLE, Orchestrator achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being 2.5x more efficient. On tau2-Bench and FRAMES, Orchestrator surpasses GPT-5 by a wide margin while using only about 30% of the cost. Extensive analysis shows that Orchestrator achieves the best trade-off between performance and cost under multiple metrics, and generalizes robustly to unseen tools. These results demonstrate that composing diverse tools with a lightweight orchestration model is both more efficient and more effective than existing methods, paving the way for practical and scalable tool-augmented reasoning systems.
PDF551December 4, 2025