利用長期記憶來擴充語言模型

摘要

現有的大型語言模型（LLMs）由於輸入長度限制只能接受固定大小的輸入，這導致無法利用過去輸入的豐富長篇內容信息。為解決此問題，我們提出了一個名為「具有長期記憶的語言模型增強框架」（LongMem）的架構，使LLMs能夠記憶長期歷史。我們設計了一種新穎的解耦網絡架構，將原始主幹LLM凍結為記憶編碼器，並設計了一個自適應殘差側網絡作為記憶檢索器和讀取器。這種解耦記憶設計可以輕鬆地緩存和更新長期過去內容，以進行記憶檢索，而不會受到記憶陳舊的影響。通過記憶增強適應訓練，LongMem可以記憶長期過去內容，並將長期記憶應用於語言建模。所提出的記憶檢索模塊可以處理其記憶庫中的無限長內容，以使各種下游任務受益。通常情況下，LongMem可以將長篇記憶擴展到65k個標記，因此可以將眾多額外示範示例作為長篇記憶進行上下文學習。實驗結果表明，我們的方法在具有挑戰性的長篇內容建模基準ChapterBreak上優於強大的長篇內容模型，並在記憶增強上下文學習方面比LLMs實現了顯著的改進。結果表明，所提出的方法對幫助語言模型記憶和利用長篇內容是有效的。我們的代碼已在https://aka.ms/LongMem上開源。

English

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.

利用長期記憶來擴充語言模型

Augmenting Language Models with Long-Term Memory

摘要

Support