LLM量子化の幾何学：Babaiの最近接平面アルゴリズムとしてのGPTQ

要旨

大規模言語モデル（LLM）の重みを16ビットから低ビット幅に量子化することは、巨大なトランスフォーマーをより手頃なアクセラレータに展開するための事実上のアプローチです。GPTQは、LLMスケールでのワンショット学習後量子化の標準的な手法の一つとして登場しました。しかし、その内部動作は、幾何学的な意味や最悪ケースの保証を曖昧にするアドホックな代数更新のシーケンスとして記述されています。本研究では、線形層に対して後ろから前へ（最後の次元から最初の次元へ）実行される場合、GPTQは、層の入力のヘッセ行列によって定義される格子に対する古典的な最近接ベクトル問題（CVP）のBabaiの最近接平面アルゴリズムと数学的に同一であることを示します。この等価性は、洗練された数学的議論に基づいており、二つの分析的な帰結をもたらします：(i) GPTQの誤差伝播ステップに直感的な幾何学的解釈が与えられる；(ii) クリッピングなしの条件下で、GPTQはBabaiのアルゴリズムの誤差上限を継承する。これらの結果を総合すると、GPTQは確固たる理論的基盤に置かれ、数十億パラメータモデルのための将来の量子化アルゴリズムの設計に向けて、格子アルゴリズムの進歩を取り入れる道が開かれます。

English

Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale. Yet, its inner workings are described as a sequence of ad-hoc algebraic updates that obscure any geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: (i) the GPTQ error propagation step gains an intuitive geometric interpretation; (ii) GPTQ inherits the error upper bound of Babai's algorithm under the no-clipping condition. Taken together, these results place GPTQ on firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models.

LLM量子化の幾何学：Babaiの最近接平面アルゴリズムとしてのGPTQ

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

要旨

Support