利用長期記憶來擴充語言模型
Augmenting Language Models with Long-Term Memory
June 12, 2023
作者: Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
cs.AI
摘要
現有的大型語言模型(LLMs)由於輸入長度限制只能接受固定大小的輸入,這導致無法利用過去輸入的豐富長篇內容信息。為解決此問題,我們提出了一個名為「具有長期記憶的語言模型增強框架」(LongMem)的架構,使LLMs能夠記憶長期歷史。我們設計了一種新穎的解耦網絡架構,將原始主幹LLM凍結為記憶編碼器,並設計了一個自適應殘差側網絡作為記憶檢索器和讀取器。這種解耦記憶設計可以輕鬆地緩存和更新長期過去內容,以進行記憶檢索,而不會受到記憶陳舊的影響。通過記憶增強適應訓練,LongMem可以記憶長期過去內容,並將長期記憶應用於語言建模。所提出的記憶檢索模塊可以處理其記憶庫中的無限長內容,以使各種下游任務受益。通常情況下,LongMem可以將長篇記憶擴展到65k個標記,因此可以將眾多額外示範示例作為長篇記憶進行上下文學習。實驗結果表明,我們的方法在具有挑戰性的長篇內容建模基準ChapterBreak上優於強大的長篇內容模型,並在記憶增強上下文學習方面比LLMs實現了顯著的改進。結果表明,所提出的方法對幫助語言模型記憶和利用長篇內容是有效的。我們的代碼已在https://aka.ms/LongMem上開源。
English
Existing large language models (LLMs) can only afford fix-sized inputs due to
the input length limit, preventing them from utilizing rich long-context
information from past inputs. To address this, we propose a framework, Language
Models Augmented with Long-Term Memory (LongMem), which enables LLMs to
memorize long history. We design a novel decoupled network architecture with
the original backbone LLM frozen as a memory encoder and an adaptive residual
side-network as a memory retriever and reader. Such a decoupled memory design
can easily cache and update long-term past contexts for memory retrieval
without suffering from memory staleness. Enhanced with memory-augmented
adaptation training, LongMem can thus memorize long past context and use
long-term memory for language modeling. The proposed memory retrieval module
can handle unlimited-length context in its memory bank to benefit various
downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k
tokens and thus cache many-shot extra demonstration examples as long-form
memory for in-context learning. Experiments show that our method outperforms
strong long-context models on ChapterBreak, a challenging long-context modeling
benchmark, and achieves remarkable improvements on memory-augmented in-context
learning over LLMs. The results demonstrate that the proposed method is
effective in helping language models to memorize and utilize long-form
contents. Our code is open-sourced at https://aka.ms/LongMem.