Mentor-KD: 小さな言語モデルをより良い多段階推論者にする

要旨

大規模言語モデル（LLMs）は、Chain-of-Thought（CoT）プロンプティングを活用して、さまざまな複雑なタスクで顕著なパフォーマンスを発揮しています。最近の研究では、LLMsの推論能力を転送するために、推論蒸留と呼ばれる知識蒸留（KD）アプローチが提案されています。このアプローチは、LLM教師によって生成された複数段階の根拠による言語モデルを微調整することで、LLMsの推論能力を転送します。しかし、LLM教師モデルからの不十分な蒸留セットに関する2つの課題、つまり1）データ品質と2）ソフトラベルの提供について、不十分に考慮されています。本論文では、Mentor-KDを提案し、上記の課題に対処しながら、LLMsの複数段階の推論能力をより小さなLMに効果的に蒸留します。具体的には、メンターである中間サイズのタスク固有の微調整モデルを活用して、追加のCoT注釈を増やし、推論蒸留中に学習モデルにソフトラベルを提供します。幅広いモデルと複雑な推論タスクにわたる実験を実施し、Mentor-KDの効果を確認しました。

English

Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation, which transfers such reasoning ability of LLMs through fine-tuning language models of multi-step rationales generated by LLM teachers. However, they have inadequately considered two challenges regarding insufficient distillation sets from the LLM teacher model, in terms of 1) data quality and 2) soft label provision. In this paper, we propose Mentor-KD, which effectively distills the multi-step reasoning capability of LLMs to smaller LMs while addressing the aforementioned challenges. Specifically, we exploit a mentor, intermediate-sized task-specific fine-tuned model, to augment additional CoT annotations and provide soft labels for the student model during reasoning distillation. We conduct extensive experiments and confirm Mentor-KD's effectiveness across various models and complex reasoning tasks.

Mentor-KD: 小さな言語モデルをより良い多段階推論者にする

Mentor-KD: Making Small Language Models Better Multi-step Reasoners

要旨

Support