MASQuant: マルチモーダル大規模言語モデルのためのモダリティ対応平滑化量子化

要旨

大規模言語モデル(LLM)における計算不変性を考慮した学習後量子化(PTQ)は目覚ましい進展を見せているが、多モーダル大規模言語モデル(MLLM)への応用には重大な課題が存在する。本論文では、SmoothQuantを事例研究として分析し、2つの重要な問題点を特定する：平滑化ミスアラインメントとクロスモーダル計算不変性である。これらの問題に対処するため、我々はModality-Aware Smoothing Quantization (MASQuant)という新規フレームワークを提案する。これは、(1) 平滑化ミスアラインメントを防止するためにモダリティ固有の個別の平滑化係数を学習するModality-Aware Smoothing (MAS)と、(2) SVD白色化を用いて多モーダル活性化の差異を低ランク形式に変換し、モダリティ間での統一的な量子化を可能にするクロスモーダル補償(CMC)を導入する。MASQuantは、デュアルモーダルおよびトリモーダルMLLMの両方において安定した量子化性能を示す。実験結果から、MASQuantは最先端のPTQアルゴリズムの中で競争力のある性能を有することが明らかとなった。ソースコード: https://github.com/alibaba/EfficientAI。

English

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.

MASQuant: マルチモーダル大規模言語モデルのためのモダリティ対応平滑化量子化

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

要旨

Support