利用长期记忆增强语言模型

摘要

现有的大型语言模型（LLMs）由于输入长度限制，只能接受固定大小的输入，这阻碍了它们利用过去输入中丰富的长上下文信息。为了解决这个问题，我们提出了一个框架，即增强长期记忆的语言模型（LongMem），它使LLMs能够记忆长期历史。我们设计了一个新颖的解耦网络架构，原始骨干LLM被冻结为记忆编码器，而自适应残差侧网络被设计为记忆检索器和阅读器。这种解耦的记忆设计可以轻松缓存和更新长期过去上下文以进行记忆检索，而不会受到记忆陈旧的影响。通过记忆增强适应训练，LongMem可以记忆长期过去上下文并将长期记忆用于语言建模。所提出的记忆检索模块可以处理其记忆库中的无限长度上下文，从而使各种下游任务受益。通常，LongMem可以将长格式记忆扩展到65k个标记，因此可以将许多额外的演示示例作为长格式记忆进行上下文学习。实验表明，我们的方法在具有挑战性的长上下文建模基准ChapterBreak上优于强大的长上下文模型，并在记忆增强上下文学习方面相比LLMs取得了显著改进。结果表明，所提出的方法有助于语言模型记忆和利用长格式内容。我们的代码已在https://aka.ms/LongMem上开源。

English

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.

利用长期记忆增强语言模型

Augmenting Language Models with Long-Term Memory

摘要

Support