言語モデルには睡眠が必要である

要旨

トランスフォーマーベースの大規模言語モデルは、長期にわたるタスクにますます利用されている。しかし、その注意機構はコンテキスト長に対してスケーリングが不十分である。この問題に対処するため、我々は睡眠様の統合機構を研究する。この機構では、モデルが定期的に最近のコンテキストを永続的な高速重みに変換した後、キー・バリューキャッシュをクリアする。睡眠中、モデルは蓄積されたコンテキストに対してN回のオフライン再帰パスを実行し、学習された局所ルールを通じて状態空間モデル（SSM）ブロック内の高速重みを更新する。推論時には、これにより覚醒時予測のレイテンシを維持しながら、追加の計算を睡眠に移行する。我々は、セル・オートマトンやマルチホップグラフ検索を含む制御された合成タスク、さらには現実的な数学的推論タスクにおいて本手法をテストした。これらのタスクでは、通常のトランスフォーマーやSSM‐注意ハイブリッドモデルは失敗する。次に、我々のモデルにおいて睡眠時間Nを増やすことで性能が向上し、特に深い推論を必要とする例で最大の改善が見られることを示す。

English

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs N offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning.