BitNet v2: 1비트 대형 언어 모델을 위한 Hadamard 변환 기반 네이티브 4비트 활성화

초록

1비트 대형 언어 모델(LLMs)의 효율적인 배치는 낮은 비트 폭으로의 양자화를 복잡하게 만드는 활성화 이상치(activation outliers)로 인해 어려움을 겪고 있습니다. 우리는 1비트 LLMs를 위한 네이티브 4비트 활성화 양자화를 가능하게 하는 새로운 프레임워크인 BitNet v2를 소개합니다. 어텐션 및 피드포워드 네트워크 활성화에서의 이상치를 해결하기 위해, 우리는 활성화 양자화 전에 온라인 Hadamard 변환을 적용하는 H-BitLinear 모듈을 제안합니다. 이 변환은 날카로운 활성화 분포를 더 가우시안 형태로 부드럽게 만들어, 낮은 비트 표현에 적합하게 합니다. 실험 결과, 8비트 활성화로 처음부터 학습된 BitNet v2는 BitNet b1.58의 성능과 일치함을 보여줍니다. 특히, BitNet v2는 네이티브 4비트 활성화로 학습할 때 최소한의 성능 저하를 달성하며, 배치 추론을 위한 메모리 사용량과 계산 비용을 크게 줄입니다.

English

Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by activation outliers, which complicate quantization to low bit-widths. We introduce BitNet v2, a novel framework enabling native 4-bit activation quantization for 1-bit LLMs. To tackle outliers in attention and feed-forward network activations, we propose H-BitLinear, a module applying an online Hadamard transformation prior to activation quantization. This transformation smooths sharp activation distributions into more Gaussian-like forms, suitable for low-bit representation. Experiments show BitNet v2 trained from scratch with 8-bit activations matches BitNet b1.58 performance. Crucially, BitNet v2 achieves minimal performance degradation when trained with native 4-bit activations, significantly reducing memory footprint and computational cost for batched inference.

BitNet v2: 1비트 대형 언어 모델을 위한 Hadamard 변환 기반 네이티브 4비트 활성화

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

초록

Support