大規模言語モデルにおけるマルチティーチャー知識蒸留の知識純化に関する研究

要旨

知識蒸留は、強力な大規模言語モデル（LLM）からより小型で効率的なモデルへ知識を転送する重要な技術として登場した。しかし、従来の蒸留手法は、特に複数の教師モデルを活用する場合、知識の衝突や高いリソース要求といった課題に直面している。本論文では、複数の教師LLMから得られる理論的根拠を単一の根拠に統合し、衝突を緩和して効率を向上させる「知識純化」の概念を提案する。知識純化の有効性を検証するため、我々はさらに多様な視点から5つの純化手法を提案する。実験により、これらの手法が蒸留モデルの性能を向上させるだけでなく、知識の衝突を効果的に軽減することを実証する。さらに、ルーターベースの手法は頑健な汎化能力を示し、革新的な純化技術が複数教師蒸留の最適化と、強力かつ軽量なモデルの実用的な展開を促進する可能性を強調する。

English

Knowledge distillation has emerged as a pivotal technique for transferring knowledge from stronger large language models (LLMs) to smaller, more efficient models. However, traditional distillation approaches face challenges related to knowledge conflicts and high resource demands, particularly when leveraging multiple teacher models. In this paper, we introduce the concept of Knowledge Purification, which consolidates the rationales from multiple teacher LLMs into a single rationale, thereby mitigating conflicts and enhancing efficiency. To investigate the effectiveness of knowledge purification, we further propose five purification methods from various perspectives. Our experiments demonstrate that these methods not only improve the performance of the distilled model but also effectively alleviate knowledge conflicts. Moreover, router-based methods exhibit robust generalization capabilities, underscoring the potential of innovative purification techniques in optimizing multi-teacher distillation and facilitating the practical deployment of powerful yet lightweight models.

大規模言語モデルにおけるマルチティーチャー知識蒸留の知識純化に関する研究

Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs

要旨

Support