ChatPaper.aiChatPaper

量化遇上扩散大模型:扩散大模型训练后量化的系统研究

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

August 20, 2025
作者: Haokun Lin, Haobo Xu, Yichen Wu, Ziyu Guo, Renrui Zhang, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun
cs.AI

摘要

近期,扩散大语言模型(dLLMs)的进展为自然语言生成任务提供了一种有前景的替代方案,相较于自回归(AR)LLMs,它充分利用了全注意力机制和基于去噪的解码策略。然而,由于这些模型庞大的参数量和高资源需求,其在边缘设备上的部署仍面临挑战。尽管训练后量化(PTQ)已成为压缩AR LLMs的广泛采用技术,但其在dLLMs上的适用性仍鲜有探索。本研究首次系统性地探讨了基于扩散的语言模型的量化问题。我们首先识别了激活异常值的存在,这些异常值以异常大的激活值主导了动态范围,对低位量化构成了主要挑战,因为它们使得在保留大多数数值精度方面变得困难。更重要的是,我们实施了最先进的PTQ方法,并在多种任务类型和模型变体上进行了全面评估。我们的分析围绕四个关键维度展开:位宽、量化方法、任务类别和模型类型。通过这种多视角评估,我们为不同配置下dLLMs的量化行为提供了实用见解。希望我们的发现能为未来高效部署dLLMs的研究奠定基础。所有代码和实验设置将公开发布,以支持社区研究。
English
Recent advances in diffusion large language models (dLLMs) have introduced a promising alternative to autoregressive (AR) LLMs for natural language generation tasks, leveraging full attention and denoising-based decoding strategies. However, the deployment of these models on edge devices remains challenging due to their massive parameter scale and high resource demands. While post-training quantization (PTQ) has emerged as a widely adopted technique for compressing AR LLMs, its applicability to dLLMs remains largely unexplored. In this work, we present the first systematic study on quantizing diffusion-based language models. We begin by identifying the presence of activation outliers, characterized by abnormally large activation values that dominate the dynamic range. These outliers pose a key challenge to low-bit quantization, as they make it difficult to preserve precision for the majority of values. More importantly, we implement state-of-the-art PTQ methods and conduct a comprehensive evaluation across multiple task types and model variants. Our analysis is structured along four key dimensions: bit-width, quantization method, task category, and model type. Through this multi-perspective evaluation, we offer practical insights into the quantization behavior of dLLMs under different configurations. We hope our findings provide a foundation for future research in efficient dLLM deployment. All codes and experimental setups will be released to support the community.
PDF192August 21, 2025