코드 추론의 계곡: 대규모 언어 모델의 지식 증류 확장

초록

추론 능력을 갖춘 대형 언어 모델(LLM)의 사고 흔적을 더 작은 모델로 증류하는 것이 효과적임이 입증되었습니다. 그러나 증류 데이터의 양에 따라 모델 성능이 어떻게 확장되는지에 대한 연구는 부족한 실정입니다. 본 연구에서는 두 개의 작은 비추론 LLM에서 경쟁적 코딩 기술을 증류할 때의 확장 경향을 연구합니다. 우리는 코드 추론의 골짜기(valley of code reasoning)가 존재한다는 가설을 검증합니다: 경쟁적 코딩에서의 다운스트림 성능은 데이터 양이 증가함에 따라 처음에는 하락하다가, 이후 로그-선형보다 더 가파른 방식으로 꾸준히 증가합니다. 이러한 경향을 확인한 후, 우리는 동일한 데이터에 대해 두 가지 다른 증류 단계에서 모델을 추가로 미세 조정하여 각 학습 단계에 대한 결론을 도출합니다. 우리는 낮은 데이터 영역과 중간-낮은 데이터 영역에서의 단계를 거치며, 작은 모델이 더 어려운 코딩 문제보다 더 쉬운 코딩 문제에서 상당한 이점을 얻는다는 것을 발견합니다. 또한, 놀랍게도 훈련 데이터에서 출력의 정확성이 증류 결과에 영향을 미치지 않는다는 사실도 확인했습니다. 본 연구는 직관을 넘어 코드 추론 증류의 훈련 역학을 이해하는 데 한 걸음 더 나아간 것입니다.

English

Distilling the thinking traces of a Large Language Model (LLM) with reasoning capabilities into a smaller model has been proven effective. Yet, there is a scarcity of work done on how model performances scale with the quantity of distillation data. In this work, we study the scaling trend of distilling competitive coding skills on two small non-reasoning LLMs. We validate the hypothesis that there is a valley of code reasoning: downstream performance on competitive coding first drops as data quantity increases, then it steadily increases in a sharper-than-log-linear fashion. Having identified the trend, we further fine-tune the models at two different distillation stages on the same data to ground conclusions on their respective learning phases. We learn that across stages in the low and medium-low data regimes, small models benefit significantly from easier coding questions than from harder ones. We also find that, surprisingly, the correctness of outputs in training data makes no difference to distillation outcomes. Our work represents a step forward in understanding the training dynamics of code reasoning distillation outside intuition

코드 추론의 계곡: 대규모 언어 모델의 지식 증류 확장

The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models

초록

Support