一“令”胜千“令”:通过低秩克隆实现高效知识蒸馏
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
May 19, 2025
作者: Jitai Hao, Qiang Huang, Hao Liu, Xinyan Xiao, Zhaochun Ren, Jun Yu
cs.AI
摘要
训练高性能的小型语言模型(SLMs)仍然成本高昂,即便采用从大型教师模型进行知识蒸馏和剪枝的方法。现有工作常面临三大挑战:(1)硬剪枝导致的信息丢失,(2)表示对齐效率低下,(3)信息激活(尤其是前馈网络FFN的激活)利用不足。为解决这些问题,我们提出了低秩克隆(Low-Rank Clone, LRC),一种高效的预训练方法,旨在构建与强大教师模型行为等效的SLMs。LRC通过训练一组低秩投影矩阵,既实现了通过压缩教师权重进行软剪枝,又通过将学生激活(包括FFN信号)与教师对齐来完成激活克隆。这一统一设计在最大化知识转移的同时,省去了显式对齐模块的需求。利用开源教师模型(如Llama-3.2-3B-Instruct、Qwen2.5-3B/7B-Instruct)进行的广泛实验表明,LRC在仅使用200亿令牌的情况下,匹配甚至超越了基于数万亿令牌训练的最先进模型,实现了超过1000倍的训练效率。我们的代码和模型检查点可在https://github.com/CURRENTF/LowRankClone 和 https://huggingface.co/collections/JitaiHao/low-rank-clone-lrc-6828389e96a93f1d4219dfaf 获取。
English
Training high-performing Small Language Models (SLMs) remains costly, even
with knowledge distillation and pruning from larger teacher models. Existing
work often faces three key challenges: (1) information loss from hard pruning,
(2) inefficient alignment of representations, and (3) underutilization of
informative activations, particularly from Feed-Forward Networks (FFNs). To
address these challenges, we introduce Low-Rank Clone (LRC), an efficient
pre-training method that constructs SLMs aspiring to behavioral equivalence
with strong teacher models. LRC trains a set of low-rank projection matrices
that jointly enable soft pruning by compressing teacher weights, and activation
clone by aligning student activations, including FFN signals, with those of the
teacher. This unified design maximizes knowledge transfer while removing the
need for explicit alignment modules. Extensive experiments with open-source
teachers (e.g., Llama-3.2-3B-Instruct, Qwen2.5-3B/7B-Instruct) show that LRC
matches or surpasses state-of-the-art models trained on trillions of
tokens--while using only 20B tokens, achieving over 1,000x training efficiency.
Our codes and model checkpoints are available at
https://github.com/CURRENTF/LowRankClone and
https://huggingface.co/collections/JitaiHao/low-rank-clone-lrc-6828389e96a93f1d4219dfaf.Summary
AI-Generated Summary