語言模型需要睡眠:學習自我調整與鞏固記憶
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
June 2, 2026
作者: Ali Behrouz, Farnoosh Hashemi, Vahab Mirrokni
cs.AI
摘要
過去幾十年間,機器學習演算法的設計取得了重大進展,從早期針對特定任務的淺層模型,到近期更通用的深度大型語言模型(LLMs)。儘管這些模型在需要即時預測或情境學習的任務中展現出潛力,但它們缺乏持續學習的能力,也無法有效地將其時間性情境知識轉移至長期參數中。受人類學習過程啟發,我們引入了一種「睡眠」範式,使模型能夠持續學習,透過重播將其短期脆弱的記憶蒸餾為穩定的長期知識,並藉由「作夢」過程遞迴地自我改進。具體而言,睡眠包含兩個階段:(1)記憶鞏固:一種向上蒸餾的過程,稱為知識播種,將較小自我的記憶蒸餾至較大網路中,以在保留知識的同時提供更大容量。作為概念驗證,我們提出了一種新的通用蒸餾過程來實現知識播種(即同策略蒸餾與基於強化學習的模仿學習之結合);(2)作夢:一個自我改進階段,模型利用強化學習生成合成資料的課程,以演練新知識並完善現有能力,無需人類監督。我們在長程任務、持續學習、知識融入及少量樣本泛化任務上的實驗,支持了睡眠階段的重要性。
English
The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.