UniSD: 대규모 언어 모델을 위한 통합 자기 증류 프레임워크를 향하여

초록

자기 증류(Self-distillation, SD)는 외부의 더 강력한 교사 없이도 대규모 언어 모델(LLM)을 적응시키는 유망한 경로를 제공한다. 그러나 자기회귀적 LLM에서의 SD는 자체 생성된 궤적이 자유 형식이고, 정답 여부가 작업에 의존적이며, 그럴듯한 근거조차 불안정하거나 신뢰할 수 없는 지도 신호를 제공할 수 있기 때문에 여전히 어려움이 따른다. 기존 방법들은 주로 개별 설계 선택을 검토하여 그 효과, 역할 및 상호 작용이 불명확하게 남아 있다. 본 논문에서는 자기 증류를 체계적으로 연구하기 위한 통합 프레임워크인 UniSD를 제안한다. UniSD는 지도 신호의 신뢰성, 표현 정렬, 훈련 안정성을 다루는 상호 보완적 메커니즘(다중 교사 일치, 지수 이동 평균(EMA) 교사 안정화, 토큰 수준 대조 학습, 특징 정합, 발산 클리핑)을 통합한다. 여섯 개의 벤치마크와 세 가지 모델 계열의 여섯 모델에 걸쳐, UniSD는 자기 증류가 정적 모방보다 나은 성능을 보이는 시점, 개선을 주도하는 구성 요소, 그리고 이러한 구성 요소들이 작업 전반에 걸쳐 상호 작용하는 방식을 밝혀낸다. 이러한 통찰을 바탕으로, 상호 보완적 구성 요소를 결합하여 가장 강력한 전반적 성능을 달성하는 통합 파이프라인인 UniSDfull을 구축하였으며, 이는 기본 모델 대비 +5.4점, 가장 강력한 기준 모델 대비 +2.8점의 개선을 보인다. 광범위한 평가를 통해 자기 증류가 더 강력한 외부 교사 없이 효율적인 LLM 적응을 위한 실용적이고 조정 가능한 접근법임을 강조한다.

English

Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self-generated trajectories are free-form, correctness is task-dependent, and plausible rationales can still provide unstable or unreliable supervision. Existing methods mainly examine isolated design choices, leaving their effectiveness, roles, and interactions unclear. In this paper, we propose UniSD, a unified framework to systematically study self-distillation. UniSD integrates complementary mechanisms that address supervision reliability, representation alignment, and training stability, including multi-teacher agreement, EMA teacher stabilization, token-level contrastive learning, feature matching, and divergence clipping. Across six benchmarks and six models from three model families, UniSD reveals when self-distillation improves over static imitation, which components drive the gains, and how these components interact across tasks. Guided by these insights, we construct UniSDfull, an integrated pipeline that combines complementary components and achieves the strongest overall performance, improving over the base model by +5.4 points and the strongest baseline by +2.8 points. Extensive evaluation highlights self-distillation as a practical and steerable approach for efficient LLM adaptation without stronger external teachers.

UniSD: 대규모 언어 모델을 위한 통합 자기 증류 프레임워크를 향하여

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

초록

Support