간단한 자기 지식 증류가 코드 생성 성능을 향상시킨다

초록

검증자나 교사 모델, 강화 학습 없이 순수하게 대규모 언어 모델(LLM)의 자체 출력만으로 코드 생성 능력을 향상시킬 수 있을까? 우리는 단순 자기 지식 증류(SSD)를 통해 이 질문에 긍정적으로 답한다. SSD는 특정 temperature와 truncation 설정으로 모델에서 해결책을 샘플링한 후, 표준 지도 미세 조정을 통해 해당 샘플에 대해 학습하는 방법이다. SSD를 적용한 Qwen3-30B-Instruct는 LiveCodeBench v6에서 pass@1 성능이 42.4%에서 55.3%로 향상되었으며, 이 gains는 특히 어려운 문제에서 두드러졌다. 또한 이 방법은 4B, 8B, 30B 규모의 Qwen 및 Llama 모델군(지시형 및 사고형 변형 포함)에서도 일반적으로 적용됐다. 이러한 단순한 방법이 효과적인 이유를 이해하기 위해, 우리는 LLM 디코딩 과정의 정밀도-탐색 간 충돌을 추적하고 SSD가 문맥에 따라 토큰 분포를 재구성함을 보인다. 즉, 정밀도가 중요한 상황에서는 방해가 되는 분포 꼬리를 억제하는 동시에, 탐색이 중요한 상황에서는 유용한 다양성을 보존하는 방식으로 작동한다. 종합하면, SSD는 LLM 코드 생성 성능 향상을 위한 보완적인 사후 훈련 방향을 제시한다.

English

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.

간단한 자기 지식 증류가 코드 생성 성능을 향상시킨다

Embarrassingly Simple Self-Distillation Improves Code Generation

초록

Support