OneBit: 극도로 저비트 대규모 언어 모델을 향하여

초록

모델 양자화는 모델의 가중치 행렬을 저비트 값으로 표현하는 기술로, 기대가 큰 대형 언어 모델(LLM)의 배포 시 저장 공간과 계산 오버헤드를 줄이기 위한 유망한 접근법입니다. 그러나 기존의 양자화 방법은 비트 폭이 극도로 줄어들었을 때 심각한 성능 저하를 겪으며, 이에 따라 주로 4비트 또는 8비트 값을 사용하여 모델을 양자화하는 데 초점을 맞추고 있습니다. 본 논문은 LLM의 가중치 행렬을 1비트로 과감하게 양자화함으로써, 극도로 낮은 비트 폭으로 LLM을 배포할 수 있는 길을 열었습니다. 이를 위해, 우리는 OneBit이라는 1비트 양자화 인지 학습(QAT) 프레임워크를 소개합니다. 이 프레임워크는 LLM을 더 잘 양자화하기 위한 새로운 1비트 매개변수 표현 방법과, QAT 프레임워크의 수렴 속도를 향상시키기 위한 행렬 분해 기반의 효과적인 매개변수 초기화 방법을 포함합니다. 충분한 실험 결과는 OneBit이 1비트 가중치 행렬만을 사용할 때도 견고한 학습 과정을 통해 우수한 성능(비양자화 성능의 최소 83%)을 달성함을 보여줍니다.

English

Model quantification uses low bit-width values to represent the weight matrices of models, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs. However, existing quantization methods suffer severe performance degradation when the bit-width is extremely reduced, and thus focus on utilizing 4-bit or 8-bit values to quantize models. This paper boldly quantizes the weight matrices of LLMs to 1-bit, paving the way for the extremely low bit-width deployment of LLMs. For this target, we introduce a 1-bit quantization-aware training (QAT) framework named OneBit, including a novel 1-bit parameter representation method to better quantize LLMs as well as an effective parameter initialization method based on matrix decomposition to improve the convergence speed of the QAT framework. Sufficient experimental results indicate that OneBit achieves good performance (at least 83% of the non-quantized performance) with robust training processes when only using 1-bit weight matrices.

OneBit: 극도로 저비트 대규모 언어 모델을 향하여

OneBit: Towards Extremely Low-bit Large Language Models

초록

Support