ChatPaper.aiChatPaper

BitNet:为大型语言模型扩展的1比特Transformer

BitNet: Scaling 1-bit Transformers for Large Language Models

October 17, 2023
作者: Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
cs.AI

摘要

随着大型语言模型规模的增加,部署面临挑战,由于高能耗引发了环境影响的担忧。在这项工作中,我们介绍了BitNet,这是一种可扩展且稳定的1比特Transformer架构,专为大型语言模型设计。具体而言,我们引入了BitLinear作为nn.Linear层的即插即用替代,以便从头开始训练1比特权重。在语言建模的实验结果显示,与最先进的8比特量化方法和FP16 Transformer基线相比,BitNet在实现竞争性能的同时,大幅减少了内存占用和能耗。此外,BitNet表现出类似于全精度Transformer的扩展定律,表明其在保持效率和性能优势的同时,具有有效扩展至更大型语言模型的潜力。
English
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits.
PDF10313December 15, 2024