EvolveMem：透過自動化研究實現LLM智能體的自我演化記憶架構

摘要

长期记忆对于跨会话运行的LLM代理至关重要，然而现有记忆系统将检索基础设施视为固定不变：存储内容会演化，但评分函数、融合策略和答案生成策略在部署后始终保持冻结。我们认为，真正的自适应记忆需要在两个层面实现协同演化：存储的知识与查询该知识的检索机制。我们提出EvolveMem——一种自我演化的记忆架构，它将完整的检索配置暴露为结构化动作空间，并由LLM驱动的诊断模块进行优化。在每轮演化中，该模块读取每个问题的失败日志，识别根本原因，并提出针对性的配置调整方案；受保护的元分析器在应用这些调整时，配备自动回滚与停滞探索的防护机制。这种闭环自演化实现了一种自动研究过程：系统自主对其自身架构进行迭代研究循环，取代了手动配置调优。从最小基线出发，该过程自主收敛，发现了有效的检索策略，其中包括原始动作空间中不存在的新配置维度。在LoCoMo数据集上，EvolveMem相比最强基线相对提升25.7%，相比最小基线相对提升78.0%。在MemBench数据集上，EvolveMem相比最强基线相对提升18.9%。演化后的配置可在不同基准测试间实现正迁移而非灾难性迁移，这表明自演化过程捕获了通用的检索原理而非基准测试特定的启发式规则。代码开源地址：https://github.com/aiming-lab/SimpleMem。

English

Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.