ChatPaper.aiChatPaper

低位元量化的LLaMA3模型有多好?一項實證研究

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

April 22, 2024
作者: Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno
cs.AI

摘要

Meta 的 LLaMA 系列已成為最強大的開源大型語言模型 (LLM) 系列之一。值得注意的是,LLaMA3 模型最近已經釋出,通過對超過 15T 標記數據進行超大規模預訓練,取得了令人印象深刻的性能。考慮到在資源有限情況下對 LLM 進行低位量化的廣泛應用,我們探索了將 LLaMA3 量化為低位寬時的能力。這一探索有望揭示 LLaMA3 和其他即將推出的 LLM 在低位量化方面的新見解和挑戰,特別是在解決在 LLM 壓縮中遭受的性能降級問題方面。具體而言,我們對 LLaMA3 的 10 種現有後訓練量化和 LoRA 微調方法在 1-8 位和不同數據集上進行評估,以全面揭示 LLaMA3 的低位量化性能。我們的實驗結果顯示,在這些情況下,LLaMA3 仍然存在相當大的性能降級,特別是在超低位寬時。這突顯了在低位寬下需要在未來發展中彌合的顯著性能差距。我們期望這一實證研究將有助於推進未來模型,將 LLM 推向更低的位寬,以實現更高的準確性。我們的項目已在 https://github.com/Macaronlin/LLaMA3-Quantization 上釋出,而量化的 LLaMA3 模型已在 https://huggingface.co/LLMQ 上釋出。
English
Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when quantized to low bit-width. This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression. Specifically, we evaluate the 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's low-bit quantization performance. Our experiment results indicate that LLaMA3 still suffers non-negligent degradation in these scenarios, especially in ultra-low bit-width. This highlights the significant performance gap under low bit-width that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, pushing the LLMs to lower bit-width with higher accuracy for being practical. Our project is released on https://github.com/Macaronlin/LLaMA3-Quantization and quantized LLaMA3 models are released in https://huggingface.co/LLMQ.

Summary

AI-Generated Summary

PDF4612December 15, 2024