Mem-π：通过学习生成时机与内容的自适应记忆

摘要

我们提出Mem-π，一种用于大语言模型（LLM）代理的自适应记忆框架，其核心思想是在需要时按需生成有用指导，而非从外部记忆库中检索。现有记忆增强型代理通常依赖从情景记忆库或技能库中进行的相似性检索，返回的静态条目往往与当前上下文不匹配。相比之下，Mem-π使用一个专用的语言或视觉语言模型（拥有独立参数，与下游代理分离）来为复杂任务生成上下文特定的指导。基于当前代理上下文，该模型联合决策何时生成指导以及生成何种指导。我们采用一种决策与内容解耦的强化学习（RL）目标对其进行训练，使其能够在生成无益时主动放弃，否则生成简洁且有用的指导。在涵盖网页导航、终端工具使用和基于文本的具身交互等多种代理基准测试中，Mem-π一致优于基于检索的基线方法和先前经RL优化的记忆基线，在网页导航任务上实现了超过30%的相对性能提升。

English

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-π uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-π consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.