EvolveMem: LLMエージェントのためのAutoResearchによる自己進化メモリアーキテクチャ

要旨

複数セッションにわたって動作するLLMエージェントにとって、長期記憶は不可欠である。しかし、既存の記憶システムは検索基盤を固定されたものとして扱い、記憶内容は進化する一方で、スコアリング関数、融合戦略、回答生成方針はデプロイ時から凍結されたままである。我々は、真に適応的な記憶には、記憶された知識とそれを問い合わせる検索メカニズムの2レベルにおける共進化が必要であると主張する。本論文では、EvolveMemを提案する。これは、全検索設定構成を構造化された行動空間として公開し、LLM駆動の診断モジュールによって最適化する自己進化型メモリアーキテクチャである。各進化ラウンドにおいて、診断モジュールは質問ごとの失敗ログを読み取り、根本原因を特定し、対象を絞った設定構成の調整を提案する。ガード付きメタ分析器は、回帰時自動復元および停滞時自動探索の保護機構を備え、これらの調整を適用する。この閉ループ自己進化はAutoResearchプロセスを実現する。すなわち、システムは自律的に自身のアーキテクチャに対する反復的研究サイクルを実施し、手動による設定構成の調整を不要とする。最小限のベースラインから開始し、プロセスは自律的に収束し、元の行動空間には存在しないまったく新しい設定構成次元を含む、効果的な検索戦略を発見する。LoCoMoにおいて、EvolveMemは最強のベースラインを相対25.7%上回り、最小ベースラインに対して相対78.0%の改善を達成した。MemBenchにおいても、EvolveMemは最強のベースラインを相対18.9%上回る。進化した設定構成は、壊滅的転移ではなく正の転移をもってベンチマーク間で転移可能であり、自己進化プロセスがベンチマーク固有のヒューリスティクスではなく、普遍的な検索原則を捕捉していることを示している。コードはhttps://github.com/aiming-lab/SimpleMemで公開されている。

English

Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.