MemTrace:大语言模型记忆系统中的错误追踪与归因
MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems
May 27, 2026
作者: Xinle Deng, Ruobin Zhong, Hujin Peng, Xiaoben Lu, Yanzhe Wu, Guang Li, Buqiang Xu, Yunzhi Yao, Jizhan Fang, Haoliang Cao, Junjie Guo, Yuan Yuan, Ziqing Ma, Yuanqiang Yu, Rui Hu, Baohua Dong, Hangcheng Zhu, Ningyu Zhang
cs.AI
摘要
记忆对于使大型语言模型支持长程推理至关重要,然而现有记忆系统仍不可靠且难以调试。追踪记忆的动态演化对于理解信息如何随时间合成、传播或损坏至关重要。本文研究了LLM记忆系统中错误追踪与归因这一新问题。我们提出了一种新颖框架,将记忆流水线转化为可执行的记忆演化图,实现对操作信息流的细粒度追踪。进而构建了MemTraceBench基准测试集,该数据集来源于长上下文、RAG、Mem0和EverMemOS等代表性记忆系统,用于系统研究记忆失效模式。我们进一步提出一种自动归因方法,通过迭代追踪操作子图来精确定位任何失败案例的根本原因。分析表明,记忆失效具有系统性,源于信息丢失和检索错位等操作层面问题。关键的是,我们利用这些细粒度归因信号指导下游提示词优化,构建了自动纠正错误的闭环系统,将端到端任务性能提升高达7.62%。代码将在https://github.com/zjunlp/MemTrace发布。
English
Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work, we study the new problem of error tracing and attribution in LLM memory systems. We propose a novel framework that transforms memory pipelines into executable memory evolution graphs, enabling fine-grained tracing of operational information flow. We then construct MemTraceBench, a benchmark collected from representative memory systems such as Long-Context, RAG, Mem0, and EverMemOS, to systematically study memory failure modes. We further introduce an automatic attribution method that iteratively traces operation subgraphs to pinpoint the root cause of any failed case. Our analysis reveals that memory failures are systematic, stemming from operation-level issues like information loss and retrieval misalignment. Crucially, we leverage these fine-grained attribution signals to guide downstream prompt optimization, establishing a closed-loop system that automatically corrects faults and boosts end-task performance by up to 7.62%. Code will be released at https://github.com/zjunlp/MemTrace.