런타임 에이전트 메모리를 위한 쿼리 인식 예산 계층 라우팅 학습

초록

단일 컨텍스트 윈도우를 넘어서 동작하는 대규모 언어 모델(LLM) 에이전트에서 메모리의 중요성이 점점 더 커지고 있지만, 기존 시스템 대부분은 비효율적이고 질의-중요 정보를 누락시킬 수 있는 오프라인 방식의 질의-무관 메모리 구축에 의존하고 있습니다. 런타임 메모리 활용이 자연스러운 대안이지만, 기존 연구에서는 상당한 오버헤드가 발생하고 성능-비용 절충에 대한 명시적 제어가 제한되는 경우가 많았습니다. 본 연구에서는 명시적이고 질의-인식 성능-비용 제어를 위한 런타임 에이전트 메모리 프레임워크인 BudgetMem을 제안합니다. BudgetMem은 메모리 처리를 일련의 메모리 모듈 집합으로 구성하며, 각 모듈은 세 가지 예산 계층(즉, Low/Mid/High)으로 제공됩니다. 경량 라우터가 모듈 간에 예산 계층 라우팅을 수행하여 작업 성능과 메모리 구축 비용을 균형 있게 조절하며, 이는 강화 학습으로 훈련된 컴팩트 신경망 정책으로 구현됩니다. BudgetMem을 통합 테스트베드로 활용하여 예산 계층을 실현하는 세 가지 상호 보완적 전략, 즉 구현 방식(메서드 복잡도), 추론 방식(추론 행동), 용량(모듈 모델 크기)을 연구합니다. LoCoMo, LongMemEval, HotpotQA 벤치마크에서 BudgetMem은 성능을 최우선시할 때(즉, 높은 예산 설정) 강력한 베이스라인을 능가하며, 더 제한된 예산 하에서도 더 나은 정확도-비용 경계를 제공합니다. 또한, 우리의 분석은 서로 다른 계층화 전략의 강점과 약점을 분리하여 다양한 예산 체제 하에서 각 축이 가장 유리한 절충점을 제공하는 조건을 명확히 합니다.

English

Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present BudgetMem, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., Low/Mid/High). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.

런타임 에이전트 메모리를 위한 쿼리 인식 예산 계층 라우팅 학습

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

초록

Support