稀疏比特网络：1.58位大语言模型与半结构化稀疏性天然契合

摘要

半结构化N:M稀疏性与低位量化（如1.58位BitNet）是提升大语言模型效率的两种前景广阔的技术，但现有研究大多将二者割裂探讨。本文首次系统研究其相互作用，发现1.58位BitNet相比全精度模型天然具备更优的N:M稀疏兼容性。为验证该特性，我们提出Sparse-BitNet统一框架，在保证训练稳定性的前提下同步实现1.58位量化与动态N:M稀疏化。在多模型规模与训练机制（稀疏预训练及稠密到稀疏调度）下的实验表明：在相同稀疏度下，1.58位BitNet的性能衰减始终小于全精度基线，且在精度崩溃前可承受更高程度的结构化稀疏。通过定制稀疏张量核心，Sparse-BitNet在训练与推理阶段均实现显著加速，最高达1.30倍。这些成果证明，将极低位量化与半结构化N:M稀疏相结合是构建高效大语言模型的重要方向。代码已开源：https://github.com/AAzdi/Sparse-BitNet

English

Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse tensor core, Sparse-BitNet achieves substantial speedups in both training and inference, reaching up to 1.30X. These results highlight that combining extremely low-bit quantization with semi-structured N:M sparsity is a promising direction for efficient LLMs. Code available at https://github.com/AAzdi/Sparse-BitNet

稀疏比特网络：1.58位大语言模型与半结构化稀疏性天然契合

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

摘要

Support