LLM 양자화의 기하학: GPTQ를 바바이의 최근접 평면 알고리즘으로

초록

대규모 언어 모델(LLM)의 가중치를 16비트에서 더 낮은 비트폭으로 양자화하는 것은 거대한 트랜스포머 모델을 더 경제적인 가속기에 배포하기 위한 사실상의 표준 접근법입니다. GPTQ는 LLM 규모에서의 일회성 사후 훈련 양자화를 위한 표준 방법 중 하나로 부상했습니다. 그러나 GPTQ의 내부 동작은 기하학적 의미나 최악의 경우 보장을 모호하게 만드는 일련의 임시 대수적 업데이트로 설명됩니다. 본 연구에서는 선형 레이어에 대해 역순(마지막 차원에서 첫 번째 차원으로)으로 실행될 때, GPTQ가 레이어 입력의 헤시안 행렬에 의해 정의된 격자에서의 고전적인 최근접 벡터 문제(CVP)에 대한 바바이의 최근접 평면 알고리즘과 수학적으로 동일함을 보입니다. 이 동등성은 정교한 수학적 논증에 기반하며, 두 가지 분석적 결과를 가져옵니다: (i) GPTQ의 오류 전파 단계가 직관적인 기하학적 해석을 얻음; (ii) GPTQ가 클리핑이 없는 조건에서 바바이 알고리즘의 오류 상한을 상속받음. 이러한 결과들은 GPTQ를 견고한 이론적 기반 위에 위치시키고, 수십 년간의 격자 알고리즘 연구 성과를 십억 파라미터 모델을 위한 미래의 양자화 알고리즘 설계에 도입할 수 있는 길을 열어줍니다.

English

Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale. Yet, its inner workings are described as a sequence of ad-hoc algebraic updates that obscure any geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: (i) the GPTQ error propagation step gains an intuitive geometric interpretation; (ii) GPTQ inherits the error upper bound of Babai's algorithm under the no-clipping condition. Taken together, these results place GPTQ on firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models.

LLM 양자화의 기하학: GPTQ를 바바이의 최근접 평면 알고리즘으로

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

초록

Support