EvolveMem: LLM 에이전트를 위한 AutoResearch 기반 자기 진화 메모리 아키텍처

초록

장기 기억은 여러 세션에 걸쳐 작동하는 LLM 에이전트에게 필수적이지만, 기존 메모리 시스템은 검색 인프라를 고정된 것으로 간주한다. 즉, 저장된 내용은 진화하지만 점수 함수, 융합 전략 및 답변 생성 정책은 배포 시점에 그대로 고정된다. 우리는 진정한 적응형 메모리가 저장된 지식과 이를 질의하는 검색 메커니즘의 두 수준에서 공동 진화를 필요로 한다고 주장한다. 우리는 LLM 기반 진단 모듈에 의해 최적화된 구조화된 행동 공간으로 전체 검색 설정을 노출하는 자가 진화 메모리 아키텍처인 EvolveMem을 제시한다. 각 진화 라운드에서 모듈은 질문별 실패 로그를 읽고 근본 원인을 식별하며 목표된 설정 조정을 제안한다. 보호된 메타 분석기는 퇴보 시 복원 및 정체 시 탐색 보호 장치를 통해 이를 자동으로 적용한다. 이 폐쇄 루프 자가 진화는 AutoResearch 프로세스를 구현한다. 즉, 시스템이 자체 아키텍처에 대해 반복 연구 주기를 자율적으로 수행하여 수동 설정 조정을 대체한다. 최소 기준선에서 시작하여 프로세스는 자율적으로 수렴하며, 원래 행동 공간에 존재하지 않았던 완전히 새로운 설정 차원을 포함한 효과적인 검색 전략을 발견한다. LoCoMo에서 EvolveMem은 가장 강력한 기준선 대비 25.7% 상대적 우위를 보이며, 최소 기준선 대비 78.0% 상대적 개선을 달성한다. MemBench에서 EvolveMem은 가장 강력한 기준선을 18.9% 상대적으로 초과한다. 진화된 설정은 치명적 전이가 아닌 긍정적 전이로 벤치마크 간에 전이되며, 이는 자가 진화 프로세스가 벤치마크 특화 휴리스틱보다는 보편적 검색 원칙을 포착함을 나타낸다. 코드는 https://github.com/aiming-lab/SimpleMem에서 확인할 수 있다.

English

Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.