1비트 LLM의 시대: 모든 대형 언어 모델은 1.58비트에 있다

초록

최근 BitNet과 같은 연구는 1비트 대형 언어 모델(LLM)의 새로운 시대를 열어가고 있다. 본 연구에서는 LLM의 모든 매개변수(또는 가중치)가 삼항값 {-1, 0, 1}을 가지는 1비트 LLM 변형인 BitNet b1.58을 소개한다. 이 모델은 동일한 모델 크기와 학습 토큰을 사용한 전체 정밀도(즉, FP16 또는 BF16) Transformer LLM과 perplexity 및 최종 작업 성능 측면에서 동등한 성능을 보이면서도, 지연 시간, 메모리, 처리량 및 에너지 소비 측면에서 훨씬 더 경제적이다. 더욱 근본적으로, 1.58비트 LLM은 고성능이면서도 비용 효율적인 차세대 LLM을 훈련하기 위한 새로운 스케일링 법칙과 방법론을 정의한다. 또한, 이는 새로운 계산 패러다임을 가능하게 하며, 1비트 LLM에 최적화된 특수 하드웨어 설계의 문을 열어준다.

English

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

1비트 LLM의 시대: 모든 대형 언어 모델은 1.58비트에 있다

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

초록

Support