自蒸馏赋能持续学习
Self-Distillation Enables Continual Learning
January 27, 2026
作者: Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal
cs.AI
摘要
持续学习能力——使模型能够在不削弱现有能力的前提下掌握新技能与知识——始终是基础模型面临的核心挑战。虽然同策略强化学习可减轻遗忘现象,但其依赖的显式奖励函数往往难以获取。基于专家示范的学习作为主要替代方案,目前以监督微调为主导,而这种方法本质上是异策略的。我们提出自蒸馏微调法,这种简洁的方法能够直接从示范数据中实现同策略学习。SDFT通过让示范条件化模型担任自身导师,充分利用上下文学习能力,生成同策略训练信号,从而在掌握新技能的同时保持原有能力。在技能学习与知识获取任务中,SDFT持续超越监督微调,既获得更高的新任务准确率,又显著降低灾难性遗忘。序列学习实验表明,SDFT能使单一模型随时间推移持续积累多项技能且不发生性能衰退,确立了同策略蒸馏作为示范数据持续学习的可行路径。
English
Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations.