長期記憶を統合した言語モデルの拡張

要旨

既存の大規模言語モデル（LLM）は、入力長の制約により固定サイズの入力しか処理できず、過去の入力から得られる豊富な長文脈情報を活用することができません。この問題を解決するため、我々は長期的な記憶を可能にするフレームワーク「Language Models Augmented with Long-Term Memory（LongMem）」を提案します。このフレームワークでは、元のバックボーンLLMをメモリエンコーダとして凍結し、適応型の残差サイドネットワークをメモリ検索器およびリーダーとして機能させる、新しい分離型ネットワークアーキテクチャを設計しました。この分離型メモリ設計により、長期的な過去の文脈をキャッシュし、更新することが容易になり、メモリの陳腐化に悩まされることなくメモリ検索を行うことができます。メモリ拡張適応トレーニングを強化することで、LongMemは長い過去の文脈を記憶し、長期的な記憶を言語モデリングに活用することが可能になります。提案されたメモリ検索モジュールは、メモリバンク内で無制限の長さの文脈を処理し、さまざまな下流タスクに役立てることができます。特に、LongMemは長文メモリを65,000トークンまで拡張し、多数のデモンストレーション例を長文メモリとしてキャッシュすることで、文脈内学習に活用することができます。実験結果は、我々の手法が挑戦的な長文脈モデリングベンチマークであるChapterBreakにおいて強力な長文脈モデルを上回り、LLMを超えるメモリ拡張文脈内学習の顕著な改善を達成することを示しています。これらの結果は、提案手法が言語モデルが長文コンテンツを記憶し活用するのに効果的であることを実証しています。我々のコードはhttps://aka.ms/LongMemで公開されています。

English

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.

長期記憶を統合した言語モデルの拡張

Augmenting Language Models with Long-Term Memory

要旨

Support