多教师知识蒸馏中知识纯化方法在大型语言模型上的探索

摘要

知识蒸馏已成为将知识从强大大型语言模型向更精简高效模型迁移的关键技术。然而传统蒸馏方法面临知识冲突和高资源需求的挑战，尤其在利用多个教师模型时更为突出。本文提出知识纯化概念，通过将多个教师大模型的推理依据整合为单一依据，从而缓解冲突并提升效率。为验证知识纯化的有效性，我们进一步从多维度提出五种纯化方法。实验表明，这些方法不仅能提升蒸馏模型的性能，还可有效缓解知识冲突。此外，基于路由器的纯化方法展现出强大的泛化能力，印证了创新性纯化技术在优化多教师蒸馏、推动强大轻量化模型实际部署方面的潜力。

English

Knowledge distillation has emerged as a pivotal technique for transferring knowledge from stronger large language models (LLMs) to smaller, more efficient models. However, traditional distillation approaches face challenges related to knowledge conflicts and high resource demands, particularly when leveraging multiple teacher models. In this paper, we introduce the concept of Knowledge Purification, which consolidates the rationales from multiple teacher LLMs into a single rationale, thereby mitigating conflicts and enhancing efficiency. To investigate the effectiveness of knowledge purification, we further propose five purification methods from various perspectives. Our experiments demonstrate that these methods not only improve the performance of the distilled model but also effectively alleviate knowledge conflicts. Moreover, router-based methods exhibit robust generalization capabilities, underscoring the potential of innovative purification techniques in optimizing multi-teacher distillation and facilitating the practical deployment of powerful yet lightweight models.

多教师知识蒸馏中知识纯化方法在大型语言模型上的探索

Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs

摘要

Support