BitNet v2：采用Hadamard变换实现原生4位激活的1位大语言模型

摘要

1位大语言模型（LLMs）的高效部署受到激活值异常点的阻碍，这些异常点使得向低位宽量化变得复杂。我们推出了BitNet v2，这是一个创新框架，能够为1位LLMs实现原生4位激活量化。针对注意力机制和前馈网络中激活值的异常点问题，我们提出了H-BitLinear模块，该模块在激活量化前应用在线哈达玛变换。这一变换将尖锐的激活分布平滑为更接近高斯分布的形式，适合低位表示。实验表明，使用8位激活从头训练的BitNet v2与BitNet b1.58性能相当。重要的是，BitNet v2在采用原生4位激活训练时，性能下降极小，显著降低了批量推理的内存占用和计算成本。

English

Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by activation outliers, which complicate quantization to low bit-widths. We introduce BitNet v2, a novel framework enabling native 4-bit activation quantization for 1-bit LLMs. To tackle outliers in attention and feed-forward network activations, we propose H-BitLinear, a module applying an online Hadamard transformation prior to activation quantization. This transformation smooths sharp activation distributions into more Gaussian-like forms, suitable for low-bit representation. Experiments show BitNet v2 trained from scratch with 8-bit activations matches BitNet b1.58 performance. Crucially, BitNet v2 achieves minimal performance degradation when trained with native 4-bit activations, significantly reducing memory footprint and computational cost for batched inference.

BitNet v2：采用Hadamard变换实现原生4位激活的1位大语言模型

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

摘要

Support