低位元量化的LLaMA3模型有多好?一項實證研究
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
April 22, 2024
作者: Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno
cs.AI
摘要
Meta 的 LLaMA 系列已成為最強大的開源大型語言模型 (LLM) 系列之一。值得注意的是,LLaMA3 模型最近已經釋出,通過對超過 15T 標記數據進行超大規模預訓練,取得了令人印象深刻的性能。考慮到在資源有限情況下對 LLM 進行低位量化的廣泛應用,我們探索了將 LLaMA3 量化為低位寬時的能力。這一探索有望揭示 LLaMA3 和其他即將推出的 LLM 在低位量化方面的新見解和挑戰,特別是在解決在 LLM 壓縮中遭受的性能降級問題方面。具體而言,我們對 LLaMA3 的 10 種現有後訓練量化和 LoRA 微調方法在 1-8 位和不同數據集上進行評估,以全面揭示 LLaMA3 的低位量化性能。我們的實驗結果顯示,在這些情況下,LLaMA3 仍然存在相當大的性能降級,特別是在超低位寬時。這突顯了在低位寬下需要在未來發展中彌合的顯著性能差距。我們期望這一實證研究將有助於推進未來模型,將 LLM 推向更低的位寬,以實現更高的準確性。我們的項目已在 https://github.com/Macaronlin/LLaMA3-Quantization 上釋出,而量化的 LLaMA3 模型已在 https://huggingface.co/LLMQ 上釋出。
English
Meta's LLaMA family has become one of the most powerful open-source Large
Language Model (LLM) series. Notably, LLaMA3 models have recently been released
and achieve impressive performance across various with super-large scale
pre-training on over 15T tokens of data. Given the wide application of low-bit
quantization for LLMs in resource-limited scenarios, we explore LLaMA3's
capabilities when quantized to low bit-width. This exploration holds the
potential to unveil new insights and challenges for low-bit quantization of
LLaMA3 and other forthcoming LLMs, especially in addressing performance
degradation problems that suffer in LLM compression. Specifically, we evaluate
the 10 existing post-training quantization and LoRA-finetuning methods of
LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's
low-bit quantization performance. Our experiment results indicate that LLaMA3
still suffers non-negligent degradation in these scenarios, especially in
ultra-low bit-width. This highlights the significant performance gap under low
bit-width that needs to be bridged in future developments. We expect that this
empirical study will prove valuable in advancing future models, pushing the
LLMs to lower bit-width with higher accuracy for being practical. Our project
is released on https://github.com/Macaronlin/LLaMA3-Quantization and quantized
LLaMA3 models are released in https://huggingface.co/LLMQ.Summary
AI-Generated Summary