大規模言語モデルのための直接選好知識蒸留

要旨

大規模言語モデル（LLM）の分野において、知識蒸留（Knowledge Distillation, KD）は、教師モデルから学生モデルへ能力を転移するための重要な技術です。しかし、既存のKD手法は、LLMの蒸留において効率性や従来のKLダイバージェンスの測定能力の不足といった制約と課題に直面しています。LLMが暗黙的な報酬関数として機能し得ることが示されており、これをKLダイバージェンスの補完として定義します。本研究では、LLM向けの直接選好知識蒸留（Direct Preference Knowledge Distillation, DPKD）を提案します。DPKDは、分布ダイバージェンスを用いて選好損失と暗黙的報酬関数を表現します。LLMのKDを2段階に再定式化します。まず、暗黙的報酬と逆KLダイバージェンスからなる目的関数を最適化し、次に、教師出力が学生出力よりも選好される確率を向上させます。120Mから13Bまでのパラメータを持つLLMを用いて、様々なデータセットで実験と分析を行い、DPKDアプローチの広範な適用性と有効性を実証しました。同時に、暗黙的報酬と出力選好がKDにおいて有効であることを実験と理論分析を通じて証明しました。DPKD手法は、出力応答の精度と完全一致率の両方においてベースライン手法を上回りました。コードとデータはhttps://aka.ms/dpkdで公開されています。

English

In the field of large language models (LLMs), Knowledge Distillation (KD) is a critical technique for transferring capabilities from teacher models to student models. However, existing KD methods face limitations and challenges in distillation of LLMs, including efficiency and insufficient measurement capabilities of traditional KL divergence. It is shown that LLMs can serve as an implicit reward function, which we define as a supplement to KL divergence. In this work, we propose Direct Preference Knowledge Distillation (DPKD) for LLMs. DPKD utilizes distribution divergence to represent the preference loss and implicit reward function. We re-formulate KD of LLMs into two stages: first optimizing and objective consisting of implicit reward and reverse KL divergence and then improving the preference probability of teacher outputs over student outputs. We conducted experiments and analysis on various datasets with LLM parameters ranging from 120M to 13B and demonstrate the broad applicability and effectiveness of our DPKD approach. Meanwhile, we prove the value and effectiveness of the introduced implicit reward and output preference in KD through experiments and theoretical analysis. The DPKD method outperforms the baseline method in both output response precision and exact match percentage. Code and data are available at https://aka.ms/dpkd.

大規模言語モデルのための直接選好知識蒸留

Direct Preference Knowledge Distillation for Large Language Models

要旨

Support