1つのトークンが1,000以上のトークンに相当：低ランククローンによる効率的な知識蒸留

要旨

高性能な小型言語モデル（SLM）の訓練は、大規模な教師モデルからの知識蒸留や枝刈りを行っても依然としてコストがかかる。既存の研究では、主に以下の3つの課題に直面している：(1) ハードプルーニングによる情報損失、(2) 表現の非効率なアライメント、(3) 特にフィードフォワードネットワーク（FFN）からの有益な活性化の未活用。これらの課題に対処するため、我々はLow-Rank Clone（LRC）を提案する。これは、強力な教師モデルとの動作等価性を目指してSLMを構築する効率的な事前訓練手法である。LRCは、教師モデルの重みを圧縮することでソフトプルーニングを可能にし、学生モデルの活性化（FFN信号を含む）を教師モデルのそれとアライメントさせることで活性化クローンを実現する一連の低ランク射影行列を訓練する。この統一された設計により、明示的なアライメントモジュールを必要とせずに知識伝達を最大化する。オープンソースの教師モデル（例：Llama-3.2-3B-Instruct、Qwen2.5-3B/7B-Instruct）を用いた広範な実験により、LRCは数兆トークンで訓練された最先端モデルに匹敵またはそれを上回る性能を達成しつつ、わずか200億トークンを使用して1,000倍以上の訓練効率を実現することが示された。我々のコードとモデルチェックポイントは、https://github.com/CURRENTF/LowRankClone および https://huggingface.co/collections/JitaiHao/low-rank-clone-lrc-6828389e96a93f1d4219dfaf で公開されている。

English

Training high-performing Small Language Models (SLMs) remains costly, even with knowledge distillation and pruning from larger teacher models. Existing work often faces three key challenges: (1) information loss from hard pruning, (2) inefficient alignment of representations, and (3) underutilization of informative activations, particularly from Feed-Forward Networks (FFNs). To address these challenges, we introduce Low-Rank Clone (LRC), an efficient pre-training method that constructs SLMs aspiring to behavioral equivalence with strong teacher models. LRC trains a set of low-rank projection matrices that jointly enable soft pruning by compressing teacher weights, and activation clone by aligning student activations, including FFN signals, with those of the teacher. This unified design maximizes knowledge transfer while removing the need for explicit alignment modules. Extensive experiments with open-source teachers (e.g., Llama-3.2-3B-Instruct, Qwen2.5-3B/7B-Instruct) show that LRC matches or surpasses state-of-the-art models trained on trillions of tokens--while using only 20B tokens, achieving over 1,000x training efficiency. Our codes and model checkpoints are available at https://github.com/CURRENTF/LowRankClone and https://huggingface.co/collections/JitaiHao/low-rank-clone-lrc-6828389e96a93f1d4219dfaf.

1つのトークンが1,000以上のトークンに相当：低ランククローンによる効率的な知識蒸留

A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

要旨

Support