δ-mem：面向大型语言模型的高效在线记忆

摘要

大型语言模型在长期助手和智能体系统中，日益需要积累和复用历史信息。单纯扩展上下文窗口不仅成本高昂，而且往往难以确保上下文的有效利用。我们提出δ-mem，这是一种轻量级记忆机制，通过紧凑的联想记忆在线状态来增强冻结的全注意力主干网络。δ-mem将过往信息压缩为通过delta规则学习更新的固定大小状态矩阵，并在生成过程中利用其读出结果生成注意力计算的低秩修正。仅使用8×8的在线记忆状态，δ-mem的平均得分就达到了冻结主干网络的1.10倍，以及最强非δ-mem记忆基线的1.15倍。在重度依赖记忆的基准测试中，它取得了更大的提升——在MemoryAgentBench上达到1.31倍，在LoCoMo上达到1.20倍，同时很大程度上保留了通用能力。这些结果表明，通过一个紧凑的在线状态直接与注意力计算耦合，无需全微调、主干替换或显式上下文扩展，即可实现有效的记忆。

English

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching 1.31times on MemoryAgentBench and 1.20times on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

δ-mem：面向大型语言模型的高效在线记忆

δ-mem: Efficient Online Memory for Large Language Models

摘要

Support