토큰 하나가 천 개의 토큰보다 가치 있다: 저랭크 복제를 통한 효율적인 지식 증류

초록

고성능 소형 언어 모델(SLM)을 훈련시키는 것은 더 큰 교사 모델로부터의 지식 증류(knowledge distillation)와 가지치기(pruning)를 사용하더라도 여전히 비용이 많이 듭니다. 기존 연구는 주로 세 가지 주요 문제에 직면합니다: (1) 강력한 가지치기로 인한 정보 손실, (2) 표현 정렬의 비효율성, (3) 피드포워드 네트워크(FFN)와 같은 정보성 활성화의 미흡한 활용. 이러한 문제를 해결하기 위해, 우리는 강력한 교사 모델과의 행동적 동등성을 목표로 하는 효율적인 사전 훈련 방법인 Low-Rank Clone(LRC)을 제안합니다. LRC는 교사 모델의 가중치를 압축하여 소프트 가지치기를 가능하게 하고, FFN 신호를 포함한 학생 모델의 활성화를 교사 모델과 정렬하여 활성화 복제를 가능하게 하는 일련의 저랭크 투영 행렬을 훈련합니다. 이 통합된 설계는 명시적인 정렬 모듈 없이도 지식 전달을 극대화합니다. 오픈소스 교사 모델(예: Llama-3.2-3B-Instruct, Qwen2.5-3B/7B-Instruct)을 사용한 광범위한 실험에서 LRC는 수조 개의 토큰으로 훈련된 최첨단 모델을 능가하거나 동등한 성능을 보이면서도 단 200억 개의 토큰만 사용하여 1,000배 이상의 훈련 효율성을 달성했습니다. 우리의 코드와 모델 체크포인트는 https://github.com/CURRENTF/LowRankClone와 https://huggingface.co/collections/JitaiHao/low-rank-clone-lrc-6828389e96a93f1d4219dfaf에서 확인할 수 있습니다.

English

Training high-performing Small Language Models (SLMs) remains costly, even with knowledge distillation and pruning from larger teacher models. Existing work often faces three key challenges: (1) information loss from hard pruning, (2) inefficient alignment of representations, and (3) underutilization of informative activations, particularly from Feed-Forward Networks (FFNs). To address these challenges, we introduce Low-Rank Clone (LRC), an efficient pre-training method that constructs SLMs aspiring to behavioral equivalence with strong teacher models. LRC trains a set of low-rank projection matrices that jointly enable soft pruning by compressing teacher weights, and activation clone by aligning student activations, including FFN signals, with those of the teacher. This unified design maximizes knowledge transfer while removing the need for explicit alignment modules. Extensive experiments with open-source teachers (e.g., Llama-3.2-3B-Instruct, Qwen2.5-3B/7B-Instruct) show that LRC matches or surpasses state-of-the-art models trained on trillions of tokens--while using only 20B tokens, achieving over 1,000x training efficiency. Our codes and model checkpoints are available at https://github.com/CURRENTF/LowRankClone and https://huggingface.co/collections/JitaiHao/low-rank-clone-lrc-6828389e96a93f1d4219dfaf.

토큰 하나가 천 개의 토큰보다 가치 있다: 저랭크 복제를 통한 효율적인 지식 증류

A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

초록

Support