预算感知工具使用实现高效智能体扩展
Budget-Aware Tool-Use Enables Effective Agent Scaling
November 21, 2025
作者: Tengxiao Liu, Zifeng Wang, Jin Miao, I-Hung Hsu, Jun Yan, Jiefeng Chen, Rujun Han, Fangyuan Xu, Yanfei Chen, Ke Jiang, Samira Daruki, Yi Liang, William Yang Wang, Tomas Pfister, Chen-Yu Lee
cs.AI
摘要
扩展测试时计算量能够提升大语言模型(LLM)在不同任务上的表现,这一策略已被延伸至工具增强型智能体领域。对于这类智能体而言,扩展不仅涉及基于标记的"思考",还包括通过工具调用的"行动"。工具调用次数直接制约着智能体与外部环境的交互能力。然而我们发现,单纯增加工具调用预算并不能提升性能,因为智能体缺乏"预算意识"会很快触及性能天花板。为解决这一问题,我们研究如何在明确工具调用预算下有效扩展此类智能体,重点关注网络搜索智能体。我们首先提出预算追踪器——一种轻量级插件,可为智能体提供持续的预算意识,实现简单而有效的扩展。进一步我们开发了BATS(预算感知的测试时扩展框架),该先进框架利用预算意识动态调整其规划与验证策略,根据剩余资源决定是"深入挖掘"有潜力的线索,还是"转向"新路径。为系统分析成本与性能的缩放关系,我们建立了统一成本度量标准,同步考量标记消耗与工具消耗。我们首次对预算约束下的智能体进行系统性研究,表明具备预算意识的方法能产生更优的缩放曲线,并推动成本-性能帕累托边界外移。本研究通过实证分析为工具增强型智能体的扩展机制提供了更透明、更系统化的理解路径。
English
Scaling test-time computation improves performance across different tasks on large language models (LLMs), which has also been extended to tool-augmented agents. For these agents, scaling involves not only "thinking" in tokens but also "acting" via tool calls. The number of tool calls directly bounds the agent's interaction with the external environment. However, we find that simply granting agents a larger tool-call budget fails to improve performance, as they lack "budget awareness" and quickly hit a performance ceiling. To address this, we study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents. We first introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness, enabling simple yet effective scaling. We further develop BATS (Budget Aware Test-time Scaling), an advanced framework that leverages this awareness to dynamically adapt its planning and verification strategy, deciding whether to "dig deeper" on a promising lead or "pivot" to new paths based on remaining resources. To analyze cost-performance scaling in a controlled manner, we formalize a unified cost metric that jointly accounts for token and tool consumption. We provide the first systematic study on budget-constrained agents, showing that budget-aware methods produce more favorable scaling curves and push the cost-performance Pareto frontier. Our work offers empirical insights toward a more transparent and principled understanding of scaling in tool-augmented agents.