圧縮の再考：大規模言語モデルにおける潜在特徴の低次元モデリング

要旨

大規模言語モデル（LLM）の膨大な規模ゆえに、従来の圧縮手法を直接適用することは現実的ではありません。最小限の勾配更新でさえも、特にコンシューマーグレードのハードウェアにおいては、計算上の課題を引き起こします。本論文では、特徴空間における低ランク分解と重み空間における再パラメータ化を伴う縮小次元モデリングに基づいた、LLMのパラメトリックかつ実用的な圧縮のための革新的なアプローチを提案します。特に、この圧縮技術は層ごとに動作し、GPUデバイスを必要とせず、メモリと時間の厳しい制約下でも数十億規模のモデルの圧縮を可能にします。我々の手法は、行列分解を活用することでモデル圧縮における重要な進展を示し、現在の最先端の構造化プルーニング手法と比較して優れた効果を実証しています。

English

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.

圧縮の再考：大規模言語モデルにおける潜在特徴の低次元モデリング

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

要旨

Support