MASQuant: Modalitätsbewusste Glättungsquantisierung für multimodale große Sprachmodelle

Zusammenfassung

Post-Training-Quantisierung (PTQ) mit Recheninvarianz für Large Language Models (LLMs) hat bemerkenswerte Fortschritte gezeigt, doch ihre Anwendung auf Multimodale Large Language Models (MLLMs) birgt erhebliche Herausforderungen. In diesem Artikel analysieren wir SmoothQuant als Fallstudie und identifizieren zwei kritische Probleme: Smoothing-Misalignment und Cross-Modale Recheninvarianz. Um diese Probleme zu adressieren, schlagen wir Modality-Aware Smoothing Quantization (MASQuant) vor, ein neuartiges Framework, das (1) Modality-Aware Smoothing (MAS) einführt, das separate, modalitätsspezifische Glättungsfaktoren lernt, um Smoothing-Misalignment zu verhindern, und (2) Cross-Modale Kompensation (CMC), die Cross-Modale Recheninvarianz durch SVD-Whitening behandelt, um multimodale Aktivierungsunterschiede in niedrigrangige Formen zu transformieren und so eine einheitliche Quantisierung über Modalitäten hinweg zu ermöglichen. MASQuant zeigt stabile Quantisierungsleistung sowohl bei dual-modalen als auch tri-modalen MLLMs. Experimentelle Ergebnisse belegen, dass MASQuant mit modernsten PTQ-Algorithmen wettbewerbsfähig ist. Quellcode: https://github.com/alibaba/EfficientAI.

English

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.

MASQuant: Modalitätsbewusste Glättungsquantisierung für multimodale große Sprachmodelle

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Zusammenfassung

Support