多教师知识蒸馏框架下大语言模型的知识纯化研究

摘要

知識蒸餾已成為將強大大型語言模型的知識遷移至更小型高效模型的關鍵技術。然而，傳統蒸餾方法在利用多教師模型時面臨知識衝突與高資源消耗的挑戰。本文提出「知識純化」概念，通過整合多個教師大模型的推理邏輯為單一推理鏈，有效緩解衝突並提升效率。為驗證知識純化的有效性，我們從多維度提出五種純化方法。實驗表明，這些方法不僅提升蒸餾模型的性能，還能顯著減輕知識衝突。此外，基於路由器的純化方法展現出優異的泛化能力，印證了創新純化技術在優化多教師蒸餾、推動強大輕量模型實際部署方面的巨大潛力。

English

Knowledge distillation has emerged as a pivotal technique for transferring knowledge from stronger large language models (LLMs) to smaller, more efficient models. However, traditional distillation approaches face challenges related to knowledge conflicts and high resource demands, particularly when leveraging multiple teacher models. In this paper, we introduce the concept of Knowledge Purification, which consolidates the rationales from multiple teacher LLMs into a single rationale, thereby mitigating conflicts and enhancing efficiency. To investigate the effectiveness of knowledge purification, we further propose five purification methods from various perspectives. Our experiments demonstrate that these methods not only improve the performance of the distilled model but also effectively alleviate knowledge conflicts. Moreover, router-based methods exhibit robust generalization capabilities, underscoring the potential of innovative purification techniques in optimizing multi-teacher distillation and facilitating the practical deployment of powerful yet lightweight models.

多教师知识蒸馏框架下大语言模型的知识纯化研究

Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs

摘要

Support