δ-mem: 대규모 언어 모델을 위한 효율적인 온라인 메모리

초록

대규모 언어 모델은 장기 어시스턴트 및 에이전트 시스템에서 점점 더 역사적 정보를 축적하고 재사용해야 할 필요성이 대두되고 있다. 단순히 컨텍스트 윈도우를 확장하는 것은 비용이 많이 들며 효과적인 컨텍스트 활용을 보장하지 못하는 경우가 많다. 본 논문에서는 고정된 완전 주의 백본(frozen full-attention backbone)을 연관 메모리의 컴팩트한 온라인 상태로 보강하는 경량 메모리 메커니즘인 δ-mem을 제안한다. δ-mem은 과거 정보를 델타 규칙 학습(delta-rule learning)에 의해 업데이트되는 고정 크기 상태 행렬로 압축하고, 생성 과정에서 그 판독값을 이용하여 백본의 주의 계산에 대한 저랭크 보정(low-rank corrections)을 생성한다. 단 8×8 온라인 메모리 상태만으로도 δ-mem은 평균 점수를 고정된 백본 대비 1.10배, 가장 강력한 비-δ-mem 메모리 기준선 대비 1.15배까지 향상시킨다. 메모리 집약적 벤치마크에서는 더 큰 이득을 얻어 MemoryAgentBench에서 1.31배, LoCoMo에서 1.20배에 도달하면서 일반 능력을 대부분 유지한다. 이러한 결과는 전체 미세 조정, 백본 교체 또는 명시적 컨텍스트 확장 없이도 주의 계산과 직접 결합된 컴팩트한 온라인 상태를 통해 효과적인 메모리를 실현할 수 있음을 보여준다.

English

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching 1.31times on MemoryAgentBench and 1.20times on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

δ-mem: 대규모 언어 모델을 위한 효율적인 온라인 메모리

δ-mem: Efficient Online Memory for Large Language Models

초록

Support