BitNet v2:采用Hadamard变换实现原生4位激活的1位大语言模型
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
April 25, 2025
作者: Hongyu Wang, Shuming Ma, Furu Wei
cs.AI
摘要
1位大语言模型(LLMs)的高效部署受到激活值异常点的阻碍,这些异常点使得向低位宽量化变得复杂。我们推出了BitNet v2,这是一个创新框架,能够为1位LLMs实现原生4位激活量化。针对注意力机制和前馈网络中激活值的异常点问题,我们提出了H-BitLinear模块,该模块在激活量化前应用在线哈达玛变换。这一变换将尖锐的激活分布平滑为更接近高斯分布的形式,适合低位表示。实验表明,使用8位激活从头训练的BitNet v2与BitNet b1.58性能相当。重要的是,BitNet v2在采用原生4位激活训练时,性能下降极小,显著降低了批量推理的内存占用和计算成本。
English
Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by
activation outliers, which complicate quantization to low bit-widths. We
introduce BitNet v2, a novel framework enabling native 4-bit activation
quantization for 1-bit LLMs. To tackle outliers in attention and feed-forward
network activations, we propose H-BitLinear, a module applying an online
Hadamard transformation prior to activation quantization. This transformation
smooths sharp activation distributions into more Gaussian-like forms, suitable
for low-bit representation. Experiments show BitNet v2 trained from scratch
with 8-bit activations matches BitNet b1.58 performance. Crucially, BitNet v2
achieves minimal performance degradation when trained with native 4-bit
activations, significantly reducing memory footprint and computational cost for
batched inference.Summary
AI-Generated Summary