学习面向运行时智能体记忆的查询感知预算分级路由机制

摘要

随着大语言模型（LLM）智能体操作范围逐渐突破单一上下文窗口的限制，内存的重要性日益凸显。然而，现有系统大多采用离线且与查询无关的内存构建方式，这种方式效率低下且可能丢失关键信息。尽管运行时内存利用是一种自然的替代方案，但先前的研究往往伴随显著开销，且对性能与成本的权衡缺乏显式控制。本文提出BudgetMem——一种支持显式、查询感知的性能成本控制的运行时智能体内存框架。该框架将内存处理构建为若干内存模块，每个模块提供低/中/高三个预算层级。通过轻量级路由器在模块间执行预算层级路由，以平衡任务性能与内存构建成本，该路由机制采用强化学习训练的紧凑神经策略实现。基于BudgetMem这一统一测试平台，我们研究了实现预算层级的三种互补策略：实现方式（方法复杂度）、推理行为（推断模式）和容量配置（模块模型规模）。在LoCoMo、LongMemEval和HotpotQA基准测试中，BudgetMem在优先考虑性能（即高预算设置）时超越强基线模型，并在严格预算限制下提供更优的精度-成本边界。此外，我们的分析揭示了不同层级策略的优劣特性，明确了在不同预算条件下各维度何时能实现最佳权衡。

English

Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present BudgetMem, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., Low/Mid/High). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.