UniSD：邁向大型語言模型的統一自蒸餾框架

摘要

自我蒸餾（SD）為大型語言模型（LLM）的適應提供了一條有前景的途徑，無需依賴更強大的外部教師模型。然而，在自回歸LLM中，自我蒸餾仍然具有挑戰性，因為自我生成的軌跡是自由形式的，正確性取決於任務，且看似合理的推理過程仍可能提供不穩定或不可靠的監督訊號。現有方法主要探討孤立的設計選擇，導致其有效性、角色及互動機制仍不明確。本文提出UniSD，一個統一框架以系統性研究自我蒸餾。UniSD整合了多種互補機制——包括多教師共識、EMA教師穩定化、詞元級對比學習、特徵匹配及發散裁剪——以分別處理監督可靠性、表徵對齊及訓練穩定性問題。在六個基準測試及來自三個模型家族的六個模型上，UniSD揭示了自我蒸餾何時能優於靜態模仿、哪些組件驅動了性能提升，以及這些組件在不同任務間如何互動。基於這些洞見，我們構建了UniSDfull，一個結合互補組件的整合管線，實現了最強的整體性能，在基礎模型上提升+5.4個百分點，在最強基線模型上提升+2.8個百分點。廣泛的評估結果表明，自我蒸餾是一種實用且可控的方法，可實現無需更強外部教師模型的LLM高效適應。

English

Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self-generated trajectories are free-form, correctness is task-dependent, and plausible rationales can still provide unstable or unreliable supervision. Existing methods mainly examine isolated design choices, leaving their effectiveness, roles, and interactions unclear. In this paper, we propose UniSD, a unified framework to systematically study self-distillation. UniSD integrates complementary mechanisms that address supervision reliability, representation alignment, and training stability, including multi-teacher agreement, EMA teacher stabilization, token-level contrastive learning, feature matching, and divergence clipping. Across six benchmarks and six models from three model families, UniSD reveals when self-distillation improves over static imitation, which components drive the gains, and how these components interact across tasks. Guided by these insights, we construct UniSDfull, an integrated pipeline that combines complementary components and achieves the strongest overall performance, improving over the base model by +5.4 points and the strongest baseline by +2.8 points. Extensive evaluation highlights self-distillation as a practical and steerable approach for efficient LLM adaptation without stronger external teachers.

UniSD：邁向大型語言模型的統一自蒸餾框架

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

摘要

Support