MemTrain: 자가지도 컨텍스트 메모리 학습

초록

메모리는 장기 지평 LLM 에이전트에게 필수적인 능력으로, 확장된 상호작용을 통해 축적된 정보를 보존하고 활용할 수 있게 한다. 기존의 메모리-에이전트 접근법은 일반적으로 하류 작업에 대해 강화 학습을 통해 엔드-투-엔드로 훈련된다. 그러나 메모리 집약적 시나리오를 위한 고품질 주석 문제를 수집하는 데는 비용이 많이 들고, 결과 훈련 데이터는 일반적인 메모리 행동을 포괄할 만한 충분한 다양성을 갖추지 못하는 경우가 많다. 본 연구에서는 LLM 에이전트의 컨텍스트 메모리 능력을 일반적으로 향상시켜 보다 효과적인 하류 사후 훈련을 가능하게 하는 자기지도 학습 프레임워크인 MemTrain을 제안한다. MemTrain은 레이블이 없는 위키백과 코퍼스에 대해 두 가지 결합된 대리 작업을 도입한다: (1) 엔드-투-엔드 마스크 재구성 목표는 모델이 여러 차례의 메모리 업데이트 후 마스킹된 엔티티를 복구하도록 요구하며, 이는 최종 결과 관점에서 메모리 유지를 장려한다; (2) 중간 메모리 회상 목표는 모델이 중간 메모리 상태를 사용하여 마스킹된 과거 정보를 재구성하도록 요구하며, 상호작용 과정 전반에 걸쳐 충실한 압축과 메모리 완전성을 장려한다. 이 두 목표는 GRPO를 사용하여 공동 최적화된다. 장문 질의응답 및 검색 기반 질의응답 벤치마크에 대한 광범위한 실험 결과, MemTrain은 다양한 모델에 걸쳐 하류 메모리 집약적 추론 성능을 일관되게 향상시키며, 직접적인 작업별 사후 훈련 대비 최대 17.67포인트의 향상을 달성함을 보여준다.

English

Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize information accumulated across extended interactions. Existing memory-agent approaches are typically trained end-to-end with reinforcement learning on downstream tasks. However, collecting high-quality annotated problems for memory-intensive scenarios is costly, and the resulting training data often lack sufficient diversity to cover general memory behaviors. In this work, we propose MemTrain, a self-supervised training framework for generally enhancing the context-memory capability of LLM agents for more effective downstream post-training. MemTrain introduces two coupled proxy tasks over unlabeled Wikipedia corpora: (1) an end-to-end masked reconstruction objective, which requires the model to recover masked entities after multiple rounds of memory updates, thereby encouraging memory maintenance from the final outcome perspective; and (2) an intermediate memory recall objective, which requires the model to reconstruct masked historical information using intermediate memory states, encouraging faithful compression and memory completeness throughout the interaction process. The two objectives are jointly optimized using GRPO. Extensive experiments on long-text QA and search-based QA benchmarks demonstrate that MemTrain consistently improves downstream memory-intensive reasoning performance across different models, achieving gains of up to 17.67 points over direct task-specific post-training.