Mem-π: 생성할 시기와 대상을 학습하는 적응형 메모리

초록

본 논문에서는 대규모 언어 모델(LLM) 에이전트를 위한 적응형 메모리 프레임워크인 Mem-π를 제안한다. Mem-π는 외부 메모리 저장소에서 검색하는 대신, 필요 시 유용한 지침을 생성한다. 기존 메모리 증강 에이전트는 일반적으로 에피소드 기억 저장소나 스킬 라이브러리에서 유사도 기반 검색에 의존하여, 현재 맥락과 종종 정렬되지 않는 정적 항목을 반환한다. 이와 달리 Mem-π는 하류(downstream) 에이전트와 분리된 자체 파라미터를 가진 전용 언어 또는 비전-언어 모델을 사용하여 복잡한 작업에 대한 맥락별 지침을 생성한다. 현재 에이전트 맥락에 조건화된 이 모델은 지침을 생성할 시점과 생성할 내용을 공동으로 결정한다. 우리는 결정-내용 분리 강화 학습(RL) 목표로 이 모델을 훈련하여, 생성이 도움이 되지 않을 때는 생성을 자제하고, 그렇지 않을 때는 간결하고 유용한 지침을 생성하도록 한다. 웹 탐색, 터미널 기반 도구 사용, 텍스트 기반 체화 상호작용을 포괄하는 다양한 에이전트 벤치마크에서 Mem-π는 검색 기반 및 기존 RL 최적화 메모리 기준선을 일관되게 능가하며, 웹 탐색 작업에서 30% 이상의 상대적 개선을 달성하였다.

English

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-π uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-π consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.