Mem-π: 生成するタイミングと内容を学習する適応的メモリ

要旨

本論文では、大規模言語モデル（LLM）エージェント向けの適応型メモリフレームワーク「Mem-π」を提案する。本手法では、外部メモリからの検索ではなく、要求に応じて有用なガイダンスを生成する。既存のメモリ拡張型エージェントは、典型的にはエピソード記憶バンクやスキルライブラリからの類似性ベースの検索に依存しており、返される静的なエントリが現在のコンテキストと一致しないことが多い。これに対し、Mem-πは下流のエージェントとは別の独自のパラメータを持つ専用の言語モデルまたは視覚言語モデルを使用し、複雑なタスクに対してコンテキスト固有のガイダンスを生成する。現在のエージェントコンテキストに基づいて、このモデルはガイダンスを生成するタイミングとその内容を共同で決定する。我々は、決定と内容を分離した強化学習（RL）目的関数を用いてこれを訓練し、生成が役に立たない場合は控え、そうでなければ簡潔で有用なガイダンスを生成できるようにする。ウェブナビゲーション、端末ベースのツール使用、テキストベースの身体性インタラクションにわたる多様なエージェントベンチマークにおいて、Mem-πは検索ベースや従来のRL最適化メモリベースラインを一貫して上回り、ウェブナビゲーションタスクでは30%以上の相対的な改善を達成した。

English

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-π uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-π consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.