δ-mem: 大規模言語モデルのための効率的なオンラインメモリ

要旨

大規模言語モデルは、長期アシスタントやエージェントシステムにおいて、履歴情報を蓄積し再利用する必要性が高まっている。単にコンテキストウィンドウを拡大する手法はコストがかかる上、効果的なコンテキスト活用を保証できないことが多い。本稿では、凍結されたフルアテンションバックボーンに、連想記憶のコンパクトなオンライン状態を付加する軽量メモリ機構δ-memを提案する。δ-memは過去の情報を固定サイズの状態行列に圧縮し、これをデルタ則学習によって更新する。さらに、その読み出しを用いて、生成時のバックボーンのアテンション計算に対する低ランク補正を生成する。わずか8×8のオンラインメモリ状態で、δ-memは平均スコアを凍結バックボーンの1.10倍、δ-mem以外の最強のメモリベースラインの1.15倍に向上させる。特にメモリ負荷の高いベンチマークではより大きな改善が見られ、MemoryAgentBenchで1.31倍、LoCoMoで1.20倍を達成するとともに、一般的な性能も大部分維持する。これらの結果は、完全なファインチューニングやバックボーンの置き換え、明示的なコンテキスト拡張を行うことなく、コンパクトなオンライン状態をアテンション計算と直接結合することで、効果的なメモリを実現可能であることを示している。

English

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching 1.31times on MemoryAgentBench and 1.20times on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

δ-mem: 大規模言語モデルのための効率的なオンラインメモリ

δ-mem: Efficient Online Memory for Large Language Models

要旨

Support