DistiLLM-2: 対照的アプローチによる大規模言語モデルの蒸留の強化

要旨

大規模言語モデル（LLM）における蒸留の成功にもかかわらず、これまでの研究の多くは、教師モデルと生徒モデルが生成したデータに対して同一の損失関数を適用してきました。これらの戦略は、損失関数の定式化とデータタイプの間の相乗効果を見落としており、結果として生徒モデルの性能向上が最適化されていませんでした。この問題に対処するため、我々はDistiLLM-2を提案します。これは、教師モデルの応答の尤度を高めると同時に、生徒モデルの応答の尤度を低めることで、この相乗効果を活用する対照的なアプローチです。我々の広範な実験により、DistiLLM-2が、指示追従やコード生成など多様なタスクにおいて高性能な生徒モデルを構築するだけでなく、選好アライメントや視覚言語拡張など多様なアプリケーションをサポートすることが示されました。これらの発見は、対照的なアプローチが、多様なデータタイプにわたって教師モデルと生徒モデルを効果的に整合させることで、LLM蒸留の効率を向上させる可能性を強調しています。

English

Despite the success of distillation in large language models (LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses by harnessing this synergy. Our extensive experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, including instruction-following and code generation, but also supports diverse applications, such as preference alignment and vision-language extensions. These findings highlight the potential of a contrastive approach to enhance the efficacy of LLM distillation by effectively aligning teacher and student models across varied data types.

DistiLLM-2: 対照的アプローチによる大規模言語モデルの蒸留の強化

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

要旨

Support