EvolveMem：通过自动研究实现的自进化记忆架构，面向LLM智能体

摘要

长期记忆对于跨会话运行的大语言模型智能体至关重要，但现有记忆系统将检索基础设施视为固定不变：存储内容会演化，而评分函数、融合策略和答案生成策略在部署时保持固化。我们认为，真正的自适应记忆需要在两个层面上实现协同进化：存储的知识以及查询该知识的检索机制。我们提出EvolveMem——一种自进化记忆架构，它将完整的检索配置暴露为一个结构化动作空间，并由大语言模型驱动的诊断模块进行优化。在每一轮进化中，该模块读取逐问题的失败日志，识别根本原因，并提出针对性的配置调整；带有安全保护机制的元分析器应用这些调整，并具备性能下降时自动回退、性能停滞时自动探索的安全机制。这种闭环自进化实现了一个自动研究过程：系统自主对自身架构进行迭代研究循环，取代了手动配置调优。从最小基线出发，该过程自主收敛，发现了包括原始动作空间中不存在的全新配置维度在内的有效检索策略。在LoCoMo上，EvolveMem相比最强基线取得了25.7%的相对提升，相比最小基线实现了78.0%的相对提升。在MemBench上，EvolveMem相比最强基线取得了18.9%的相对提升。进化后的配置在基准测试间实现了正向迁移而非灾难性迁移，表明自进化过程捕捉到了通用检索原则而非面向特定基准的启发式策略。代码见 https://github.com/aiming-lab/SimpleMem。

English

Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.