ChatPaper.aiChatPaper

BitNet蒸馏

BitNet Distillation

October 15, 2025
作者: Xun Wu, Shaohan Huang, Wenhui Wang, Ting Song, Li Dong, Yan Xia, Furu Wei
cs.AI

摘要

本文提出了一种轻量级框架——BitNet蒸馏(BitDistill),该框架能够将现成的全精度大语言模型(如Qwen)针对特定下游任务微调至1.58位精度(即三元权重{-1, 0, 1}),在显著降低计算成本的同时,实现强劲的任务特定性能。具体而言,BitDistill融合了三大关键技术:源自BitNet的SubLN模块、基于MiniLM的多头注意力蒸馏,以及作为关键预热步骤的持续预训练,旨在缓解全精度与1.58位大语言模型在特定任务上微调后性能差距的可扩展性问题。实验结果表明,BitDistill在不同模型规模下均能取得与全精度模型相媲美的性能,同时实现高达10倍的内存节省及在CPU上2.65倍的推理加速。相关代码已发布于https://github.com/microsoft/BitNet。
English
In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost. Specifically, BitDistill incorporates three key techniques: the SubLN module, as introduced in BitNet; multi-head attention distillation, based on MiniLM; and continual pre-training, which serves as a crucial warm-up step to mitigate the scalability issue of the performance gap between finetuned full-precision and 1.58-bit LLMs on specific tasks. Experimental results show that BitDistill achieves performance comparable to the full-precision counterpart models across model size, while enabling up to 10x memory savings and 2.65x faster inference on CPUs. Code is available at https://github.com/microsoft/BitNet.
PDF454October 17, 2025