예산 인식 도구 활용을 통한 효율적인 에이전트 확장

초록

대규모 언어 모델(LLM)의 다양한 과제에서 시험 시간 계산량 확장은 성능 향상으로 이어지며, 이는 도구 활용 에이전트로도 확장되었습니다. 이러한 에이전트에게 확장은 토큰 단위의 "사고"뿐만 아니라 도구 호출을 통한 "행동"도 포함됩니다. 도구 호출 횟수는 에이전트의 외부 환경과의 상호작용을 직접적으로 제한합니다. 그러나 단순히 더 많은 도구 호출 예산을 부여하는 것은 에이전트가 "예산 인식" 능력을 갖추지 못하고 빠르게 성능 한계에 도달하기 때문에 성능 향상으로 이어지지 않습니다. 이를 해결하기 위해 우리는 명시적인 도구 호출 예산 하에서 웹 검색 에이전트를 중심으로 이러한 에이전트를 효과적으로 확장하는 방법을 연구합니다. 먼저, 에이전트에게 지속적인 예산 인식을 제공하는 경량 플러그인인 '예산 추적기(Budget Tracker)'를 도입하여 단순하지만 효과적인 확장을 가능하게 합니다. 더 나아가 BATS(Budget Aware Test-time Scaling)라는 고급 프레임워크를 개발하여 이러한 인식을 활용해 잔여 자원에 따라 유망한 단서를 "심층 탐색"할지 새로운 경로로 "전환"할지를 결정하며, 계획 및 검증 전략을 동적으로 조정합니다. 비용-성능 확장을 통제된 방식으로 분석하기 위해 토큰과 도구 사용량을 함께 고려하는 통합 비용 메트릭을 정형화합니다. 우리는 예산이 제한된 에이전트에 대한 첫 번째 체계적인 연구를 제시하며, 예산 인식 방법이 더 유리한 확장 곡선을 생성하고 비용-성능 파레토 최적 경계를 확장함을 보여줍니다. 본 연구는 도구 활용 에이전트의 확장에 대한 더 투명하고 체계적인 이해를 위한 실증적 통찰을 제공합니다.

English

Scaling test-time computation improves performance across different tasks on large language models (LLMs), which has also been extended to tool-augmented agents. For these agents, scaling involves not only "thinking" in tokens but also "acting" via tool calls. The number of tool calls directly bounds the agent's interaction with the external environment. However, we find that simply granting agents a larger tool-call budget fails to improve performance, as they lack "budget awareness" and quickly hit a performance ceiling. To address this, we study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents. We first introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness, enabling simple yet effective scaling. We further develop BATS (Budget Aware Test-time Scaling), an advanced framework that leverages this awareness to dynamically adapt its planning and verification strategy, deciding whether to "dig deeper" on a promising lead or "pivot" to new paths based on remaining resources. To analyze cost-performance scaling in a controlled manner, we formalize a unified cost metric that jointly accounts for token and tool consumption. We provide the first systematic study on budget-constrained agents, showing that budget-aware methods produce more favorable scaling curves and push the cost-performance Pareto frontier. Our work offers empirical insights toward a more transparent and principled understanding of scaling in tool-augmented agents.

예산 인식 도구 활용을 통한 효율적인 에이전트 확장

Budget-Aware Tool-Use Enables Effective Agent Scaling

초록

Support