TAID: 言語モデルにおける効率的な知識転送のための時間的適応補間蒸留
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
January 28, 2025
著者: Makoto Shing, Kou Misaki, Han Bao, Sho Yokoi, Takuya Akiba
cs.AI
要旨
因果言語モデルは驚異的な能力を示していますが、そのサイズはリソースに制約のある環境での展開において重要な課題を提起しています。大規模な教師モデルから知識を小さな生徒モデルに転送するための広く使用されているテクニックである知識蒸留は、モデルの圧縮のための有望なアプローチを示しています。残る重要な問題は、教師モデルと生徒モデルの間の主な違いにあります。具体的には、大きな容量のギャップ、モードの平均化、モードの崩壊があり、これらは蒸留中に障壁となります。これらの問題に対処するために、私たちは一連の実験を行い、TAIDの優れた性能を示しています。
English
Causal language models have demonstrated remarkable capabilities, but their
size poses significant challenges for deployment in resource-constrained
environments. Knowledge distillation, a widely-used technique for transferring
knowledge from a large teacher model to a small student model, presents a
promising approach for model compression. A significant remaining issue lies in
the major differences between teacher and student models, namely the
substantial capacity gap, mode averaging, and mode collapse, which pose
barriers during distillation. To address these issues, we introduce
Temporally Adaptive Interpolated Distillation (TAID), a novel
knowledge distillation approach that dynamically interpolates student and
teacher distributions through an adaptive intermediate distribution, gradually
shifting from the student's initial distribution towards the teacher's
distribution. We provide a theoretical analysis demonstrating TAID's ability to
prevent mode collapse and empirically show its effectiveness in addressing the
capacity gap while balancing mode averaging and mode collapse. Our
comprehensive experiments demonstrate TAID's superior performance across
various model sizes and architectures in both instruction tuning and
pre-training scenarios. Furthermore, we showcase TAID's practical impact by
developing two state-of-the-art compact foundation models:
TAID-LLM-1.5B for language tasks and TAID-VLM-2B for
vision-language tasks. These results demonstrate TAID's effectiveness in
creating high-performing and efficient models, advancing the development of
more accessible AI technologies.Summary
AI-Generated Summary