언어 모델도 잠이 필요하다: 자기 수정 및 기억 통합 학습

초록

지난 수십 년 동안 기계 학습 알고리즘 설계에서 상당한 진전이 있었으며, 초기 연구인 작업별 얕은 모델에서 더 일반적인 심층 대규모 언어 모델(LLM)로 발전해 왔다. 이러한 모델들은 즉각적인 예측이나 맥락 내 학습이 필요한 작업에서 유망한 결과를 보여주지만, 기존 모델들은 지속적으로 학습하고 시간적 맥락 내 지식을 장기 매개변수로 효과적으로 전이하는 능력이 부족하다. 인간의 학습 과정에서 영감을 받아, 우리는 모델이 지속적으로 학습하고, 재생을 통해 단기적인 취약한 기억을 안정적인 장기 지식으로 증류하며, '꿈꾸기' 과정을 통해 재귀적으로 자기 개선할 수 있는 '수면' 패러다임을 소개한다. 더 자세히 설명하면, 수면은 두 단계로 구성된다: (1) 기억 통합: 지식 시딩이라고 불리는 상향 증류 과정으로, 더 작은 자아의 기억을 더 큰 네트워크로 증류하여 지식을 보존하면서 더 많은 용량을 제공한다. 개념 증명으로, 우리는 지식 시딩을 위한 새로운 일반화된 증류 과정을 제시한다 (즉, 정책 기반 증류와 강화 학습 기반 모방 학습의 결합). (2) 꿈꾸기: 자기 개선 단계로, 모델이 강화 학습을 사용하여 합성 데이터의 커리큘럼을 생성함으로써 인간의 감독 없이 새로운 지식을 연습하고 기존 능력을 개선한다. 장기적 과제, 지속적 학습, 지식 통합, 그리고 퓨샷 일반화 작업에 대한 우리의 실험은 수면 단계의 중요성을 뒷받침한다.

English

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.