量化遇上擴散大模型：擴散大模型訓練後量化的系統性研究

摘要

近期，擴散式大型語言模型（dLLMs）的進展為自然語言生成任務提供了一種有前景的自迴歸（AR）LLMs替代方案，其利用了全注意力機制和基於去噪的解碼策略。然而，由於這些模型龐大的參數規模和高資源需求，其在邊緣設備上的部署仍然面臨挑戰。雖然訓練後量化（PTQ）已成為壓縮AR LLMs的廣泛採用技術，但其在dLLMs中的應用仍大多未被探索。在本研究中，我們首次系統性地研究了基於擴散的語言模型的量化問題。我們首先識別了激活值異常大的激活異常值，這些異常值主導了動態範圍，對低比特量化構成了主要挑戰，因為它們使得在大多數值中保持精度變得困難。更重要的是，我們實施了最先進的PTQ方法，並在多種任務類型和模型變體上進行了全面評估。我們的分析圍繞四個關鍵維度展開：比特寬度、量化方法、任務類別和模型類型。通過這種多視角評估，我們提供了不同配置下dLLMs量化行為的實用見解。我們希望我們的研究結果能為未來高效dLLM部署的研究奠定基礎。所有代碼和實驗設置將被公開以支持社區。

English

Recent advances in diffusion large language models (dLLMs) have introduced a promising alternative to autoregressive (AR) LLMs for natural language generation tasks, leveraging full attention and denoising-based decoding strategies. However, the deployment of these models on edge devices remains challenging due to their massive parameter scale and high resource demands. While post-training quantization (PTQ) has emerged as a widely adopted technique for compressing AR LLMs, its applicability to dLLMs remains largely unexplored. In this work, we present the first systematic study on quantizing diffusion-based language models. We begin by identifying the presence of activation outliers, characterized by abnormally large activation values that dominate the dynamic range. These outliers pose a key challenge to low-bit quantization, as they make it difficult to preserve precision for the majority of values. More importantly, we implement state-of-the-art PTQ methods and conduct a comprehensive evaluation across multiple task types and model variants. Our analysis is structured along four key dimensions: bit-width, quantization method, task category, and model type. Through this multi-perspective evaluation, we offer practical insights into the quantization behavior of dLLMs under different configurations. We hope our findings provide a foundation for future research in efficient dLLM deployment. All codes and experimental setups will be released to support the community.

量化遇上擴散大模型：擴散大模型訓練後量化的系統性研究

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

摘要

Support