MASQuant：面向多模态大语言模型的模态感知平滑量化方法

摘要

针对大语言模型(LLMs)的计算不变性后训练量化(PTQ)技术已取得显著进展，但将其应用于多模态大语言模型(MLLMs)仍面临重大挑战。本文以SmoothQuant为案例研究，揭示出两个关键问题：平滑错位与跨模态计算不变性。为解决这些问题，我们提出模态感知平滑量化(MASQuant)新框架，该框架包含两大创新：(1)模态感知平滑(MAS)技术，通过学习独立的模态特定平滑因子来避免平滑错位；(2)跨模态补偿(CMC)机制，利用SVD白化将多模态激活差异转换为低秩形式，从而解决跨模态计算不变性问题，实现跨模态的统一量化。MASQuant在双模态和三模态MLLMs上均展现出稳定的量化性能。实验结果表明，该算法在当前主流PTQ方法中具有竞争优势。源代码地址：https://github.com/alibaba/EfficientAI。

English

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.