Mem-π：通過學習何時生成與生成什麼來實現自適應記憶

摘要

我們提出了 Mem-π，一個專為大型語言模型（LLM）代理設計的自適應記憶框架，在此框架中，有用的指導是按需生成，而非從外部記憶庫中檢索而來。現有的記憶增強代理通常依賴於從情節記憶庫或技能庫進行基於相似度的檢索，返回的靜態條目往往與當前情境不一致。相比之下，Mem-π 使用一個專用的語言或視覺語言模型（配備獨立於下游代理的參數），為複雜任務生成具備情境特異性的指導。該模型根據代理的當前情境，共同決定何時生成指導以及生成何種指導。我們採用一套決策與內容解耦的強化學習（RL）目標來訓練該模型，使其能夠在生成無幫助時選擇放棄，否則生成簡潔且有用的指導。在橫跨網頁導航、終端工具使用及文字型具身互動等多樣化代理基準測試中，Mem-π 的表現始終優於基於檢索及先前 RL 最佳化的記憶基線，其中在網頁導航任務上實現了超過 30% 的相對改進。

English

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-π uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-π consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.