支出を抑え、推論を改善：LLMエージェントのための予算考慮型価値木探索

要旨

テストタイムスケーリングはLLMエージェントの信頼性向上における主要なパラダイムとなっているが、現在のアプローチは計算資源を豊富にあるものとして扱い、冗長なステップや行き詰まり軌道に対してトークンやツールの予算を浪費させる。既存の予算考慮手法は、高コストなファインチューニングを必要とするか、実行途中で介入できない大雑把な軌道レベルのヒューリスティクスに依存している。本論文では、Budget-Aware Value Tree (BAVT) を提案する。これは単一のLLMバックボーン内で、ステップレベルの価値推定に導かれた動的探索木としてマルチホップ推論をモデル化する、訓練不要な推論時フレームワークである。もう一つの重要な革新は、残り資源比率をノード価値に対する自然なスケーリング指数として用いる、予算条件付きノード選択機構である。これにより、予算が枯渇するにつれて、広範な探索から貪欲な活用へと、原理に基づいたパラメータ不要の遷移を実現する。LLMの自己評価における過信傾向に対処するため、BAVTは絶対的な状態品質ではなく相対的な進捗を評価する残差価値予測器を採用し、情報のない冗長なツール呼び出しを確実に枝刈りする。さらに、明示的な有限予算境界の下で、BAVTが確率1-ε以上で終端回答に到達するという理論的な収束保証を提供する。2つのモデルファミリーにわたる4つのマルチホップQAベンチマークでの大規模評価により、BAVTが並列サンプリングベースラインを一貫して上回ることを実証した。特に、厳しい低予算制約下でのBAVTは、ベースラインが4倍の資源配分で達成する性能を凌駕し、知的な予算管理が単純な計算資源のスケーリングを根本的に上回ることを立証した。

English

Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Existing budget-aware methods either require expensive fine-tuning or rely on coarse, trajectory-level heuristics that cannot intervene mid-execution. We propose the Budget-Aware Value Tree (BAVT), a training-free inference-time framework that models multi-hop reasoning as a dynamic search tree guided by step-level value estimation within a single LLM backbone. Another key innovation is a budget-conditioned node selection mechanism that uses the remaining resource ratio as a natural scaling exponent over node values, providing a principled, parameter-free transition from broad exploration to greedy exploitation as the budget depletes. To combat the well-known overconfidence of LLM self-evaluation, BAVT employs a residual value predictor that scores relative progress rather than absolute state quality, enabling reliable pruning of uninformative or redundant tool calls. We further provide a theoretical convergence guarantee, proving that BAVT reaches a terminal answer with probability at least 1-ε under an explicit finite budget bound. Extensive evaluations on four multi-hop QA benchmarks across two model families demonstrate that BAVT consistently outperforms parallel sampling baselines. Most notably, BAVT under strict low-budget constraints surpasses baseline performance at 4times the resource allocation, establishing that intelligent budget management fundamentally outperforms brute-force compute scaling.

支出を抑え、推論を改善：LLMエージェントのための予算考慮型価値木探索

Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents

要旨

Support