Mentor-KD：讓小型語言模型成為更好的多步驗證者

摘要

大型語言模型（LLMs）通過利用思維鏈（CoT）提示展現了在各種複雜任務中的卓越表現。最近的研究提出了一種知識蒸餾（KD）方法，即推理蒸餾，通過微調由LLM教師生成的多步推理的語言模型，將LLMs的推理能力轉移。然而，他們對LLM教師模型中的蒸餾集存在兩個挑戰考慮不足，即1）數據質量和2）軟標籤提供。在本文中，我們提出了Mentor-KD，該方法有效地將LLMs的多步推理能力蒸餾到較小的LMs，同時解決了上述挑戰。具體來說，我們利用一個導師，即中等大小的任務特定微調模型，來擴充額外的CoT標註，並在推理蒸餾過程中為學生模型提供軟標籤。我們進行了大量實驗，並確認了Mentor-KD在各種模型和複雜推理任務中的有效性。

English

Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation, which transfers such reasoning ability of LLMs through fine-tuning language models of multi-step rationales generated by LLM teachers. However, they have inadequately considered two challenges regarding insufficient distillation sets from the LLM teacher model, in terms of 1) data quality and 2) soft label provision. In this paper, we propose Mentor-KD, which effectively distills the multi-step reasoning capability of LLMs to smaller LMs while addressing the aforementioned challenges. Specifically, we exploit a mentor, intermediate-sized task-specific fine-tuned model, to augment additional CoT annotations and provide soft labels for the student model during reasoning distillation. We conduct extensive experiments and confirm Mentor-KD's effectiveness across various models and complex reasoning tasks.

Mentor-KD：讓小型語言模型成為更好的多步驗證者

Mentor-KD: Making Small Language Models Better Multi-step Reasoners

摘要

Support