MASQuant: Modaliteitsbewust Afvloeiende Kwantisatie voor Multimodale Grote Taalmodellen

Samenvatting

Post-trainingkwantisatie (PTQ) met computationele invariantie voor Large Language Models (LLM's) heeft opmerkelijke vooruitgang geboekt, maar de toepassing ervan op Multimodale Large Language Models (MLLM's) vormt aanzienlijke uitdagingen. In dit artikel analyseren we SmoothQuant als casestudy en identificeren we twee kritieke problemen: Afvlakkingsmisalignering en Cross-modale Computationele Invariantie. Om deze problemen aan te pakken, stellen we Modality-Aware Smoothing Quantization (MASQuant) voor, een nieuw raamwerk dat introduceert: (1) Modality-Aware Smoothing (MAS), welke gescheiden, modalitiespecifieke afvlakkingsfactoren leert om Afvlakkingsmisalignering te voorkomen, en (2) Cross-modale Compensatie (CMC), welke Cross-modale Computationele Invariantie aanpakt door SVD-whitening te gebruiken om multimodale activatieverschillen om te zetten in low-rank vormen, waardoor uniforme kwantisatie over modaliteiten heen mogelijk wordt. MASQuant toont stabiele kwantisatieprestaties bij zowel dual-modale als tri-modale MLLM's. Experimentele resultaten tonen aan dat MASQuant concurrerend is binnen de state-of-the-art PTQ-algoritmen. Broncode: https://github.com/alibaba/EfficientAI.

English

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.

MASQuant: Modaliteitsbewust Afvloeiende Kwantisatie voor Multimodale Grote Taalmodellen

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Samenvatting

Support