ChatPaper.aiChatPaper

語言模型需要睡眠

Language Models Need Sleep

May 25, 2026
作者: Sangyun Lee, Sean McLeish, Tom Goldstein, Giulia Fanti
cs.AI

摘要

基於Transformer的大型語言模型越來越多地用於長期任務,但其注意力機制在上下文長度增加時擴展性不佳。為解決此問題,我們研究了一種類似睡眠的鞏固機制:模型定期將近期上下文轉換為持久的快速權重,然後清除其鍵值緩存。在睡眠階段,模型對累積的上下文執行N次離線遞歸處理,並通過學習得到的局部規則更新其狀態空間模型(SSM)區塊中的快速權重。在推理階段,此機制將額外計算轉移至睡眠期間,同時保持清醒時預測的延遲。我們在受控的合成任務(包括元胞自動機和多跳圖檢索)以及一項實際的數學推理任務上測試了該方法,而這些任務中,常規Transformer及SSM-注意力混合模型均表現不佳。我們進一步證明,增加模型睡眠持續時間N可提升性能,且對需要更深層推理的樣本提升效果最大。
English
Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs N offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning.