ChatPaper.aiChatPaper

语言模型需要睡眠

Language Models Need Sleep

May 25, 2026
作者: Sangyun Lee, Sean McLeish, Tom Goldstein, Giulia Fanti
cs.AI

摘要

基于Transformer的大型语言模型越来越多地被用于长程任务,但其注意力机制随上下文长度扩展时性能下降严重。为解决此问题,我们研究了一种类似睡眠的巩固机制:模型在清除键值缓存前,周期性地将近期上下文转化为持久性快速权重。在睡眠阶段,模型对累积的上下文执行N次离线循环处理,并通过习得的局部规则更新其状态空间模型(SSM)模块中的快速权重。在推理时,该机制将额外计算转移至睡眠阶段,同时保持清醒时刻预测的延迟不变。我们在受控合成任务(包括元胞自动机和多跳图检索)以及一项现实数学推理任务上测试了该方法——在这项任务中,常规Transformer及SSM-注意力混合模型均告失败。进一步研究表明,增加睡眠时长N会提升我们模型的性能,且对需要更深层推理的样本提升效果最为显著。
English
Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs N offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning.