DistiLLM-2:對比學習方法提升大型語言模型的蒸餾效能
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
March 10, 2025
作者: Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun
cs.AI
摘要
儘管蒸餾技術在大語言模型(LLMs)中取得了成功,但大多數先前的研究對教師和學生生成的數據都採用了相同的損失函數。這些策略忽視了損失函數與數據類型之間的協同作用,導致學生模型的性能提升不夠理想。為解決這一問題,我們提出了DistiLLM-2,這是一種對比方法,通過利用這種協同作用,同時提高教師回應的可能性並降低學生回應的可能性。我們的大量實驗表明,DistiLLM-2不僅在包括指令遵循和代碼生成在內的廣泛任務中構建了高性能的學生模型,還支持偏好對齊和視覺語言擴展等多樣化應用。這些發現凸顯了對比方法在通過有效對齊教師和學生模型來增強LLM蒸餾效能方面的潛力。
English
Despite the success of distillation in large language models (LLMs), most
prior work applies identical loss functions to both teacher- and
student-generated data. These strategies overlook the synergy between loss
formulations and data types, leading to a suboptimal performance boost in
student models. To address this, we propose DistiLLM-2, a contrastive approach
that simultaneously increases the likelihood of teacher responses and decreases
that of student responses by harnessing this synergy. Our extensive experiments
show that DistiLLM-2 not only builds high-performing student models across a
wide range of tasks, including instruction-following and code generation, but
also supports diverse applications, such as preference alignment and
vision-language extensions. These findings highlight the potential of a
contrastive approach to enhance the efficacy of LLM distillation by effectively
aligning teacher and student models across varied data types.Summary
AI-Generated Summary