멘토-KD: 작은 언어 모델을 더 나은 다단계 추론자로 만들기

초록

대형 언어 모델 (LLMs)은 Chain-of-Thought (CoT) 프롬프팅을 활용하여 다양한 복잡한 작업에서 놀라운 성과를 보여주었습니다. 최근 연구에서는 LLM의 추론 능력을 전이시키는 지식 증류 (KD) 접근 방식, 즉 추론 증류를 제안했습니다. 이는 LLM 교사가 생성한 다단계 근거로 이루어진 언어 모델을 세밀하게 조정함으로써 LLM의 추론 능력을 전달합니다. 그러나 이들은 LLM 교사 모델로부터 충분하지 않은 증류 세트에 관한 두 가지 과제를 충분히 고려하지 못했습니다. 즉, 1) 데이터 품질과 2) 소프트 라벨 제공에 관한 것입니다. 본 논문에서는 앞서 언급한 과제를 해결하면서 LLM의 다단계 추론 능력을 작은 언어 모델로 효과적으로 증류하는 Mentor-KD를 제안합니다. 구체적으로, 우리는 중간 크기의 과제별 세밀하게 조정된 모델인 멘토를 활용하여 추가 CoT 주석을 증가시키고 추론 증류 중에 학생 모델에 대한 소프트 라벨을 제공합니다. 우리는 다양한 모델 및 복잡한 추론 작업에서 Mentor-KD의 효과를 확인하기 위해 포괄적인 실험을 수행합니다.

English

Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation, which transfers such reasoning ability of LLMs through fine-tuning language models of multi-step rationales generated by LLM teachers. However, they have inadequately considered two challenges regarding insufficient distillation sets from the LLM teacher model, in terms of 1) data quality and 2) soft label provision. In this paper, we propose Mentor-KD, which effectively distills the multi-step reasoning capability of LLMs to smaller LMs while addressing the aforementioned challenges. Specifically, we exploit a mentor, intermediate-sized task-specific fine-tuned model, to augment additional CoT annotations and provide soft labels for the student model during reasoning distillation. We conduct extensive experiments and confirm Mentor-KD's effectiveness across various models and complex reasoning tasks.

멘토-KD: 작은 언어 모델을 더 나은 다단계 추론자로 만들기

Mentor-KD: Making Small Language Models Better Multi-step Reasoners

초록

Support