ChatPaper.aiChatPaper

BitNet:為大型語言模型擴展的 1 位元 Transformer

BitNet: Scaling 1-bit Transformers for Large Language Models

October 17, 2023
作者: Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
cs.AI

摘要

隨著大型語言模型的增加,部署面臨挑戰,並引起對高能耗對環境的擔憂。在這項工作中,我們介紹了BitNet,這是一種可擴展且穩定的1位元Transformer架構,專為大型語言模型而設計。具體來說,我們引入了BitLinear,作為nn.Linear層的即插即用替代方案,以便從頭開始訓練1位元權重。語言建模的實驗結果顯示,與最先進的8位元量化方法和FP16 Transformer基準相比,BitNet實現了競爭性能,同時顯著減少了內存佔用和能耗。此外,BitNet表現出與全精度Transformer相似的擴展規律,表明其具有潛力有效擴展至更大的語言模型,同時保持效率和性能優勢。
English
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits.
PDF10313December 15, 2024