대규모 언어 모델을 위한 다중 교사 지식 증류에서 지식 정제 방법 탐구

초록

지식 증류는 강력한 대규모 언어 모델(LLM)의 지식을 더 작고 효율적인 모델로 전달하는 핵심 기술로 부상했습니다. 그러나 기존 증류 방식은 특히 다수의 교사 모델을 활용할 경우 지식 충돌 및 높은 자원 요구와 관련된 과제에 직면해 있습니다. 본 논문에서는 다수의 교사 LLM의 근거를 단일 근거로 통합하여 충돌을 완화하고 효율성을 향상시키는 '지식 정제' 개념을 소개합니다. 지식 정제의 효과를 검증하기 위해 우리는 다양한 관점에서 다섯 가지 정제 방법을 추가로 제안합니다. 실험 결과, 이 방법들이 증류된 모델의 성능을 향상시킬 뿐만 아니라 지식 충돌을 효과적으로 완화하는 것으로 나타났습니다. 또한 라우터 기반 방법은 강력한 일반화 능력을 보여주며, 혁신적인 정제 기술이 다중 교사 증류 최적화와 강력하면서도 경량화된 모델의 실용적 배포에 기여할 잠재력을 강조합니다.

English

Knowledge distillation has emerged as a pivotal technique for transferring knowledge from stronger large language models (LLMs) to smaller, more efficient models. However, traditional distillation approaches face challenges related to knowledge conflicts and high resource demands, particularly when leveraging multiple teacher models. In this paper, we introduce the concept of Knowledge Purification, which consolidates the rationales from multiple teacher LLMs into a single rationale, thereby mitigating conflicts and enhancing efficiency. To investigate the effectiveness of knowledge purification, we further propose five purification methods from various perspectives. Our experiments demonstrate that these methods not only improve the performance of the distilled model but also effectively alleviate knowledge conflicts. Moreover, router-based methods exhibit robust generalization capabilities, underscoring the potential of innovative purification techniques in optimizing multi-teacher distillation and facilitating the practical deployment of powerful yet lightweight models.

대규모 언어 모델을 위한 다중 교사 지식 증류에서 지식 정제 방법 탐구

Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs

초록

Support