导师知识蒸馏：让小型语言模型成为更好的多步推理器

摘要

大型语言模型（LLMs）通过利用“思维链”（CoT）提示展现了在各种复杂任务上的显著表现。最近的研究提出了一种知识蒸馏（KD）方法，即推理蒸馏，通过微调由LLM教师生成的多步理由的语言模型，将LLMs的推理能力转移。然而，他们未充分考虑LLM教师模型中不足的蒸馏集的两个挑战，即1）数据质量和2）软标签提供。在本文中，我们提出了导师知识蒸馏（Mentor-KD），有效地将LLMs的多步推理能力蒸馏到较小的LMs，同时解决上述挑战。具体而言，我们利用导师，即中等规模的任务特定微调模型，来增加额外的CoT注释，并在推理蒸馏过程中为学生模型提供软标签。我们进行了大量实验，并确认了Mentor-KD在各种模型和复杂推理任务中的有效性。

English

Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation, which transfers such reasoning ability of LLMs through fine-tuning language models of multi-step rationales generated by LLM teachers. However, they have inadequately considered two challenges regarding insufficient distillation sets from the LLM teacher model, in terms of 1) data quality and 2) soft label provision. In this paper, we propose Mentor-KD, which effectively distills the multi-step reasoning capability of LLMs to smaller LMs while addressing the aforementioned challenges. Specifically, we exploit a mentor, intermediate-sized task-specific fine-tuned model, to augment additional CoT annotations and provide soft labels for the student model during reasoning distillation. We conduct extensive experiments and confirm Mentor-KD's effectiveness across various models and complex reasoning tasks.

导师知识蒸馏：让小型语言模型成为更好的多步推理器

Mentor-KD: Making Small Language Models Better Multi-step Reasoners

摘要

Support