少花费,强推理:面向LLM智能体的预算感知价值树搜索
Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents
March 13, 2026
作者: Yushu Li, Wenlong Deng, Jiajin Li, Xiaoxiao Li
cs.AI
摘要
测试时扩展已成为提升大语言模型智能体可靠性的主流范式,但现有方法将计算资源视为无限供给,允许智能体在冗余步骤或死胡同路径上耗尽令牌与工具预算。当前具备预算意识的方法要么需要昂贵的微调,要么依赖粗糙的轨迹级启发式规则而无法在执行过程中实施干预。我们提出预算感知价值树(BAVT),这是一种免训练的推理时框架,通过在单一大语言模型主干内构建以步骤级价值估计为导向的动态搜索树,对多跳推理过程进行建模。其核心创新在于引入预算条件化节点选择机制:将剩余资源比率作为节点价值的自然缩放指数,从而在预算消耗过程中实现从广泛探索到贪婪利用的原则性、无参数过渡。针对大语言模型自评估过度自信的固有缺陷,BAVT采用残差价值预测器对相对进展而非绝对状态质量进行评分,从而可靠剪枝无信息量或冗余的工具调用。我们进一步提供理论收敛性证明,指出在显式有限预算约束下BAVT以至少1-ε的概率达成终局答案。跨两个模型族、四个多跳问答基准的广泛实验表明,BAVT持续优于并行采样基线方法。最显著的是,在严格低预算约束下,BAVT的表现超越基线方法使用4倍资源分配时的效果,这证实智能预算管理从根本上优于暴力计算扩展。
English
Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Existing budget-aware methods either require expensive fine-tuning or rely on coarse, trajectory-level heuristics that cannot intervene mid-execution. We propose the Budget-Aware Value Tree (BAVT), a training-free inference-time framework that models multi-hop reasoning as a dynamic search tree guided by step-level value estimation within a single LLM backbone. Another key innovation is a budget-conditioned node selection mechanism that uses the remaining resource ratio as a natural scaling exponent over node values, providing a principled, parameter-free transition from broad exploration to greedy exploitation as the budget depletes. To combat the well-known overconfidence of LLM self-evaluation, BAVT employs a residual value predictor that scores relative progress rather than absolute state quality, enabling reliable pruning of uninformative or redundant tool calls. We further provide a theoretical convergence guarantee, proving that BAVT reaches a terminal answer with probability at least 1-ε under an explicit finite budget bound. Extensive evaluations on four multi-hop QA benchmarks across two model families demonstrate that BAVT consistently outperforms parallel sampling baselines. Most notably, BAVT under strict low-budget constraints surpasses baseline performance at 4times the resource allocation, establishing that intelligent budget management fundamentally outperforms brute-force compute scaling.