通過帶符號的梯度下降優化權重四捨五入,用於低精度量化的LLM。
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
September 11, 2023
作者: Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv
cs.AI
摘要
大型語言模型(LLMs)已證明其在執行與語言相關的任務方面具有卓越的能力。然而,由於它們需要大量的記憶體和存儲空間,它們的部署面臨著重大挑戰。為應對這一問題,僅權重量化,特別是3位和4位的僅權重量化,已成為最可行的解決方案之一。隨著位數的減少,量化網格變得更廣,因此強調了向上和向下舍入的重要性。雖然先前的研究已經證明,在某些情況下,通過微調向上和向下舍入並添加擾動可以提高準確性,但我們的研究受到這些擾動的精確和有限邊界的驅使,僅改變舍入值的閾值具有重要意義。因此,我們提出了一種簡潔而高效的方法來優化權重舍入任務。我們的方法名為SignRound,涉及使用帶符號的梯度下降進行輕量級塊調整,使我們能夠在400個步驟內取得優異的結果。SignRound優於最近方法中已建立的最近舍入(RTN)基準,並且在不引入額外推理開銷的情況下與其競爭得令人印象深刻。源代碼將很快公開在https://github.com/intel/neural-compressor。
English
Large Language Models (LLMs) have proven their exceptional capabilities in
performing language-related tasks. However, their deployment poses significant
challenges due to their considerable memory and storage requirements. In
response to this issue, weight-only quantization, particularly 3 and 4-bit
weight-only quantization, has emerged as one of the most viable solutions. As
the number of bits decreases, the quantization grid broadens, thus emphasizing
the importance of up and down rounding. While previous studies have
demonstrated that fine-tuning up and down rounding with the addition of
perturbations can enhance accuracy in some scenarios, our study is driven by
the precise and limited boundary of these perturbations, where only the
threshold for altering the rounding value is of significance. Consequently, we
propose a concise and highly effective approach for optimizing the weight
rounding task. Our method, named SignRound, involves lightweight block-wise
tuning using signed gradient descent, enabling us to achieve outstanding
results within 400 steps. SignRound outperforms the established baseline of
rounding-to-nearest (RTN) and competes impressively against recent methods,
without introducing additional inference overhead. The source code will be
publicly available at https://github.com/intel/neural-compressor soon.