Sparse-BitNet: 1.58비트 LLM은 준-구조적 희소성에 자연스럽게 친화적이다

초록

반구조화된 N:M 희소성과 낮은 비트 양자화(예: 1.58비트 BitNet)는 대규모 언어 모델(LLM)의 효율성을 향상시키는 두 가지 유망한 접근법이지만, 지금까지는 주로 별도로 연구되어 왔습니다. 본 연구에서는 이들의 상호작용을 조사하고 1.58비트 BitNet이 완전 정밀도 모델보다 N:M 희소성과 자연스럽게 더 높은 호환성을 보인다는 것을 입증합니다. 이러한 효과를 연구하기 위해 우리는 1.58비트 양자화와 동적 N:M 희소화를 통합적으로 적용하면서도 최초로 안정적인 학습을 보장하는 통합 프레임워크인 Sparse-BitNet을 제안합니다. 다양한 모델 규모와 학습 방식(희소 사전 학습 및 조밀-희소 학습 스케줄)에서 1.58비트 BitNet은 동일한 희소성 수준에서 완전 정밀도 기준 모델보다 항상 더 작은 성능 저하를 보였으며, 정확도가 급감하기 전에 더 높은 구조적 희소성을 견딜 수 있었습니다. 더 나아가, 우리가 개발한 맞춤형 희소 텐서 코어를 활용하면 Sparse-BitNet이 학습과 추론 모두에서 최대 1.30배에 달하는 상당한 속도 향상을 달성합니다. 이러한 결과는 극도로 낮은 비트 양자화와 반구조화된 N:M 희소성을 결합하는 것이 효율적인 LLM을 위한 유망한 방향임을 강조합니다. 코드는 https://github.com/AAzdi/Sparse-BitNet에서 확인할 수 있습니다.

English

Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse tensor core, Sparse-BitNet achieves substantial speedups in both training and inference, reaching up to 1.30X. These results highlight that combining extremely low-bit quantization with semi-structured N:M sparsity is a promising direction for efficient LLMs. Code available at https://github.com/AAzdi/Sparse-BitNet

Sparse-BitNet: 1.58비트 LLM은 준-구조적 희소성에 자연스럽게 친화적이다

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

초록

Support