ChatPaper.aiChatPaper

MASQuant:面向多模态大语言模型的模态感知平滑量化方法

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

March 5, 2026
作者: Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao
cs.AI

摘要

针对大语言模型(LLMs)的计算不变性后训练量化(PTQ)技术已取得显著进展,但将其应用于多模态大语言模型(MLLMs)仍面临重大挑战。本文以SmoothQuant为案例研究,揭示出两个关键问题:平滑错位与跨模态计算不变性。为解决这些问题,我们提出模态感知平滑量化(MASQuant)新框架,该框架包含两大创新:(1)模态感知平滑(MAS)技术,通过学习独立的模态特定平滑因子来避免平滑错位;(2)跨模态补偿(CMC)机制,利用SVD白化将多模态激活差异转换为低秩形式,从而解决跨模态计算不变性问题,实现跨模态的统一量化。MASQuant在双模态和三模态MLLMs上均展现出稳定的量化性能。实验结果表明,该算法在当前主流PTQ方法中具有竞争优势。源代码地址:https://github.com/alibaba/EfficientAI。
English
Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.
PDF86March 9, 2026