量化遇上擴散大模型:擴散大模型訓練後量化的系統性研究
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
August 20, 2025
作者: Haokun Lin, Haobo Xu, Yichen Wu, Ziyu Guo, Renrui Zhang, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun
cs.AI
摘要
近期,擴散式大型語言模型(dLLMs)的進展為自然語言生成任務提供了一種有前景的自迴歸(AR)LLMs替代方案,其利用了全注意力機制和基於去噪的解碼策略。然而,由於這些模型龐大的參數規模和高資源需求,其在邊緣設備上的部署仍然面臨挑戰。雖然訓練後量化(PTQ)已成為壓縮AR LLMs的廣泛採用技術,但其在dLLMs中的應用仍大多未被探索。在本研究中,我們首次系統性地研究了基於擴散的語言模型的量化問題。我們首先識別了激活值異常大的激活異常值,這些異常值主導了動態範圍,對低比特量化構成了主要挑戰,因為它們使得在大多數值中保持精度變得困難。更重要的是,我們實施了最先進的PTQ方法,並在多種任務類型和模型變體上進行了全面評估。我們的分析圍繞四個關鍵維度展開:比特寬度、量化方法、任務類別和模型類型。通過這種多視角評估,我們提供了不同配置下dLLMs量化行為的實用見解。我們希望我們的研究結果能為未來高效dLLM部署的研究奠定基礎。所有代碼和實驗設置將被公開以支持社區。
English
Recent advances in diffusion large language models (dLLMs) have introduced a
promising alternative to autoregressive (AR) LLMs for natural language
generation tasks, leveraging full attention and denoising-based decoding
strategies. However, the deployment of these models on edge devices remains
challenging due to their massive parameter scale and high resource demands.
While post-training quantization (PTQ) has emerged as a widely adopted
technique for compressing AR LLMs, its applicability to dLLMs remains largely
unexplored. In this work, we present the first systematic study on quantizing
diffusion-based language models. We begin by identifying the presence of
activation outliers, characterized by abnormally large activation values that
dominate the dynamic range. These outliers pose a key challenge to low-bit
quantization, as they make it difficult to preserve precision for the majority
of values. More importantly, we implement state-of-the-art PTQ methods and
conduct a comprehensive evaluation across multiple task types and model
variants. Our analysis is structured along four key dimensions: bit-width,
quantization method, task category, and model type. Through this
multi-perspective evaluation, we offer practical insights into the quantization
behavior of dLLMs under different configurations. We hope our findings provide
a foundation for future research in efficient dLLM deployment. All codes and
experimental setups will be released to support the community.