EviMem:证据缺口驱动的长期对话记忆迭代检索
EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
April 30, 2026
作者: Yuyang Li, Yime He, Zeyu Zhang, Dong Gong
cs.AI
摘要
长期会话记忆需要检索分散在多个会话中的证据,但单次检索无法处理时间性和多跳问题。现有迭代方法通过生成内容或文档级信号来优化查询,但未能明确诊断证据缺口——即累积检索集中缺失的信息,导致查询优化缺乏针对性。我们提出EviMem,它结合了IRIS(基于不充分信号的迭代检索)——一种通过充分性评估检测证据缺口、诊断缺失内容并驱动定向查询优化的闭环框架,以及LaceMem(会话证据记忆的分层架构)——一种支持细粒度缺口诊断的由粗到精的记忆层次结构。在LoCoMo数据集上,EviMem在时间性(从73.3%提升至81.6%)和多跳(从65.9%提升至85.2%)问题上的评判准确率优于MIRIX,且延迟降低4.5倍。代码:https://github.com/AIGeeksGroup/EviMem。
English
Long-term conversational memory requires retrieving evidence scattered across multiple sessions, yet single-pass retrieval fails on temporal and multi-hop questions. Existing iterative methods refine queries via generated content or document-level signals, but none explicitly diagnoses the evidence gap, namely what is missing from the accumulated retrieval set, leaving query refinement untargeted. We present EviMem, combining IRIS (Iterative Retrieval via Insufficiency Signals), a closed-loop framework that detects evidence gaps through sufficiency evaluation, diagnoses what is missing, and drives targeted query refinement, with LaceMem (Layered Architecture for Conversational Evidence Memory), a coarse-to-fine memory hierarchy supporting fine-grained gap diagnosis. On LoCoMo, EviMem improves Judge Accuracy over MIRIX on temporal (73.3% to 81.6%) and multi-hop (65.9% to 85.2%) questions at 4.5x lower latency. Code: https://github.com/AIGeeksGroup/EviMem.