SigmaScale: SVD 기반 저차원 분해와 학습된 스케일링 행렬을 활용한 LLM 압축

초록

본 논문에서는 대규모 언어 모델(LLM) 압축을 위한 축소 특이값 분해(SVD) 기반 기법을 보조하기 위해, 학습 가능한 스케일링 행렬 S를 학습하는 방법인 SigmaScale을 제안한다. SigmaScale은 스케일링 행렬을 해석적으로 유도하는 대신, 활성화 인식 압축 손실 하에서 대각 행 및 열 스케일링 변환을 정의하는 두 벡터 집합을 최적화한다. 학습된 스케일링이 가중치 행렬의 효과적 내재 순위를 낮추며, 이는 효과적 순위 엔트로피의 감소로 확인된다. 또한 이러한 순위 감소는 압축 손실과 강한 상관관계를 보인다. Llama 3.1 8B Instruct 및 Qwen3-8B 모델에 대한 실험 결과, SigmaScale은 혼란도(perplexity)와 제로샷(zero-shot) 벤치마크에서 관련 최신 SVD 기반 압축 방법들과 경쟁력 있는 성능을 나타낸다. 학습된 활성화 인식 변환을 활용함으로써 SigmaScale은 개별 모델 가중치의 구조에 적응하여 저순위 LLM 압축에 보다 유연한 경로를 탐색한다. 특정 작업에서 관찰된 이점은 LLM 추론 계산 비용 절감이 필요한 응용 분야에서 본 접근법이 유효한 선택지가 될 수 있음을 시사한다.

English

We present SigmaScale, a method for learning auxiliary scaling matrices S to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaScale optimizes two sets of vectors that define diagonal row and column scaling transformations under an activation-aware compression loss. We show that learned scaling lowers the effective intrinsic rank of weight matrices, as reflected by reductions in effective-rank entropy, and that this reduction is strongly correlated with compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B show that SigmaScale is competitive with closely related state-of-the-art SVD-based compression methods across perplexity and zero-shot benchmarks. By using learned activation-aware transformations, SigmaScale explores a more flexible route to low-rank LLM compression by adapting to the structure of individual model weights. The advantage observed in specific tasks makes our approach a valid option for applications requiring a reduced LLM-inference computing cost.