δ-mem：大型語言模型的高效在線記憶

摘要

大型語言模型在長期助手與代理系統中，日益需要累積並重複利用歷史資訊。單純擴展上下文窗口不僅成本高昂，也常無法確保有效的上下文運用。我們提出 δ-mem，這是一種輕量級記憶機制，透過一個緊湊的關聯記憶在線狀態，來強化已凍結的全注意力主幹。δ-mem 將過去資訊壓縮為固定大小的狀態矩陣，並以 delta 規則學習進行更新；在生成過程中，它利用讀出結果對主幹的注意力計算產生低秩修正。僅需 8×8 的在線記憶狀態，δ-mem 即能將平均分數提升至凍結主幹的 1.10 倍，以及最強非 δ-mem 記憶基線的 1.15 倍。在記憶密集型基準上，δ-mem 獲得更顯著增益，於 MemoryAgentBench 達到 1.31 倍，於 LoCoMo 達到 1.20 倍，同時大致保留了一般能力。這些結果表明，無需完全微調、更換主幹或顯式擴展上下文，透過一個直接與注意力計算耦合的緊湊在線狀態，即可實現有效的記憶。

English

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching 1.31times on MemoryAgentBench and 1.20times on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

δ-mem：大型語言模型的高效在線記憶

δ-mem: Efficient Online Memory for Large Language Models

摘要

Support