通过带符号梯度下降优化权重四舍五入,用于低精度模型的量化。
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
September 11, 2023
作者: Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv
cs.AI
摘要
大型语言模型(LLMs)已经证明了它们在执行与语言相关的任务方面的卓越能力。然而,由于它们巨大的内存和存储需求,它们的部署面临着重大挑战。为了解决这个问题,仅权重量化,特别是3位和4位的仅权重量化,已经成为最可行的解决方案之一。随着位数的减少,量化网格变得更宽,因此强调了上舍入和下舍入的重要性。虽然先前的研究表明,通过添加扰动来微调上舍入和下舍入可以增强某些场景中的准确性,但我们的研究受到这些扰动的精确和有限边界的驱动,只有改变舍入值的阈值才具有重要意义。因此,我们提出了一种简洁而高效的优化权重舍入任务的方法。我们的方法名为SignRound,涉及使用带符号梯度下降进行轻量级块调整,使我们能够在400步内取得出色的结果。SignRound优于最近方法的基准舍入到最近(RTN),并且在不引入额外推理开销的情况下与最近的方法竞争得令人印象深刻。源代码将很快公开在https://github.com/intel/neural-compressor。
English
Large Language Models (LLMs) have proven their exceptional capabilities in
performing language-related tasks. However, their deployment poses significant
challenges due to their considerable memory and storage requirements. In
response to this issue, weight-only quantization, particularly 3 and 4-bit
weight-only quantization, has emerged as one of the most viable solutions. As
the number of bits decreases, the quantization grid broadens, thus emphasizing
the importance of up and down rounding. While previous studies have
demonstrated that fine-tuning up and down rounding with the addition of
perturbations can enhance accuracy in some scenarios, our study is driven by
the precise and limited boundary of these perturbations, where only the
threshold for altering the rounding value is of significance. Consequently, we
propose a concise and highly effective approach for optimizing the weight
rounding task. Our method, named SignRound, involves lightweight block-wise
tuning using signed gradient descent, enabling us to achieve outstanding
results within 400 steps. SignRound outperforms the established baseline of
rounding-to-nearest (RTN) and competes impressively against recent methods,
without introducing additional inference overhead. The source code will be
publicly available at https://github.com/intel/neural-compressor soon.