ChatPaper.aiChatPaper

低比特量化的LLaMA3模型有多好?一项实证研究

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

April 22, 2024
作者: Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno
cs.AI

摘要

Meta的LLaMA系列已成为最强大的开源大型语言模型(LLM)系列之一。值得注意的是,LLaMA3模型最近发布,并在超大规模预训练的15T数据令牌上取得了令人瞩目的性能。鉴于在资源有限的情况下对LLMs进行低比特量化的广泛应用,我们探讨了将LLaMA3量化为低比特宽度时的能力。这一探索有潜力揭示LLaMA3和其他即将推出的LLMs在低比特量化方面的新见解和挑战,特别是在解决LLM压缩中遇到的性能下降问题方面。具体而言,我们评估了LLaMA3的10种现有后训练量化和LoRA微调方法在1-8比特和不同数据集上,全面揭示了LLaMA3的低比特量化性能。我们的实验结果表明,在这些情景下,LLaMA3仍然存在相当大的性能下降,特别是在超低比特宽度下。这突显了在未来发展中需要弥合的低比特宽度下的显著性能差距。我们期望这一经验研究将有助于推进未来模型,推动LLMs以更高准确度实现更低比特宽度的实用性。我们的项目发布在https://github.com/Macaronlin/LLaMA3-Quantization,并且量化的LLaMA3模型发布在https://huggingface.co/LLMQ。
English
Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when quantized to low bit-width. This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression. Specifically, we evaluate the 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's low-bit quantization performance. Our experiment results indicate that LLaMA3 still suffers non-negligent degradation in these scenarios, especially in ultra-low bit-width. This highlights the significant performance gap under low bit-width that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, pushing the LLMs to lower bit-width with higher accuracy for being practical. Our project is released on https://github.com/Macaronlin/LLaMA3-Quantization and quantized LLaMA3 models are released in https://huggingface.co/LLMQ.

Summary

AI-Generated Summary

PDF4612December 15, 2024