言語モデルには睡眠が必要：自己修正と記憶統合の学習

要旨

過去数十年にわたり、機械学習アルゴリズムの設計において顕著な進展が見られ、初期のタスク特化型の浅いモデルから、より汎用的な深層大規模言語モデル（LLM）へと発展してきた。即時予測や文脈内学習を必要とするタスクで有望な結果を示す一方、既存のモデルは継続的に学習し、時間的な文脈内知識を効果的に長期パラメータへ転移する能力を欠いている。人間の学習プロセスに着想を得て、我々は「睡眠」パラダイムを導入する。これによりモデルは継続的に学習し、短期の脆弱な記憶を再生によって安定した長期知識へと蒸留し、「夢見」プロセスを通じて再帰的に自己改善を行う。詳細には、睡眠は次の二段階から構成される。（1）記憶の統合：知識の種まきと呼ばれる上方蒸留プロセスであり、より小型の自己の記憶をより大規模なネットワークへ蒸留することで、知識を保持しつつ容量を拡大する。概念実証として、我々は知識の種まきのための新たな一般化蒸留プロセス（すなわち、オン方策蒸留と強化学習に基づく模倣学習の組み合わせ）を提示する。（2）夢見：自己改善フェーズであり、モデルは強化学習を用いて合成データのカリキュラムを生成し、人間の監督なしに新たな知識を反復練習し、既存の能力を洗練する。長期的タスク、継続学習、知識統合、および少数ショット汎化タスクに関する実験は、睡眠段階の重要性を支持するものである。

English

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.