UniSD：迈向大型语言模型的统一自蒸馏框架

摘要

自蒸馏（Self-distillation, SD）为在不依赖更强外部教师模型的情况下适配大型语言模型（LLMs）提供了一条有前景的路径。然而，自回归LLM中的自蒸馏仍面临挑战，因为自生成轨迹具有自由形式，正确性依赖于任务，且看似合理的推理过程仍可能产生不稳定或不可靠的监督信号。现有方法主要考察孤立的设计选择，导致其有效性、作用及相互间的交互关系尚不清晰。为此，本文提出UniSD——一个用于系统研究自蒸馏的统一框架。UniSD整合了互补机制，以解决监督可靠性、表示对齐与训练稳定性问题，包括多教师一致性、EMA教师稳定化、token级对比学习、特征匹配以及散度裁剪。在六个基准测试、三个模型系列的六种模型上，UniSD揭示了自蒸馏何时优于静态模仿、哪些组件驱动性能提升，以及这些组件在不同任务中如何交互。基于这些洞见，我们构建了UniSDfull——一套集成了互补组件的完整流程，实现了最强的综合性能，相较基础模型提升+5.4个点，相较于最强基线提升+2.8个点。广泛的评估表明，自蒸馏是一种实用且可调控的方法，能够在无需更强外部教师的情况下高效适配LLM。

English

Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self-generated trajectories are free-form, correctness is task-dependent, and plausible rationales can still provide unstable or unreliable supervision. Existing methods mainly examine isolated design choices, leaving their effectiveness, roles, and interactions unclear. In this paper, we propose UniSD, a unified framework to systematically study self-distillation. UniSD integrates complementary mechanisms that address supervision reliability, representation alignment, and training stability, including multi-teacher agreement, EMA teacher stabilization, token-level contrastive learning, feature matching, and divergence clipping. Across six benchmarks and six models from three model families, UniSD reveals when self-distillation improves over static imitation, which components drive the gains, and how these components interact across tasks. Guided by these insights, we construct UniSDfull, an integrated pipeline that combines complementary components and achieves the strongest overall performance, improving over the base model by +5.4 points and the strongest baseline by +2.8 points. Extensive evaluation highlights self-distillation as a practical and steerable approach for efficient LLM adaptation without stronger external teachers.

UniSD：迈向大型语言模型的统一自蒸馏框架

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

摘要

Support