Qwen3量化技術的實證研究
An Empirical Study of Qwen3 Quantization
May 4, 2025
作者: Xingyu Zheng, Yuye Li, Haoran Chu, Yue Feng, Xudong Ma, Jie Luo, Jinyang Guo, Haotong Qin, Michele Magno, Xianglong Liu
cs.AI
摘要
Qwen系列已崛起為開源大型語言模型(LLMs)的領先家族,在自然語言理解任務中展現出卓越的能力。隨著近期Qwen3的發布,其在多樣化基準測試中表現出的優異性能,引發了在資源受限環境中高效部署這些模型的日益關注。低比特量化提供了一種有前景的解決方案,但其對Qwen3性能的影響仍待深入探索。本研究系統性地評估了Qwen3在不同量化設置下的魯棒性,旨在揭示壓縮這一尖端模型時的機遇與挑戰。我們嚴格評估了應用於Qwen3的五種現有經典訓練後量化技術,涵蓋1至8比特的位寬,並在多個數據集上評估其有效性。研究發現,儘管Qwen3在中等位寬下保持競爭力,但在超低精度下於語言任務中表現顯著下降,凸顯了LLM壓縮中持續存在的障礙。這些結果強調了在極端量化場景下減輕性能損失的進一步研究需求。我們預期,這項實證分析將為針對Qwen3及未來LLMs量身定制的量化方法提供可操作的見解,最終在不犧牲準確性的前提下提升其實用性。我們的項目已發佈於https://github.com/Efficient-ML/Qwen3-Quantization 和 https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b。
English
The Qwen series has emerged as a leading family of open-source Large Language
Models (LLMs), demonstrating remarkable capabilities in natural language
understanding tasks. With the recent release of Qwen3, which exhibits superior
performance across diverse benchmarks, there is growing interest in deploying
these models efficiently in resource-constrained environments. Low-bit
quantization presents a promising solution, yet its impact on Qwen3's
performance remains underexplored. This study conducts a systematic evaluation
of Qwen3's robustness under various quantization settings, aiming to uncover
both opportunities and challenges in compressing this state-of-the-art model.
We rigorously assess 5 existing classic post-training quantization techniques
applied to Qwen3, spanning bit-widths from 1 to 8 bits, and evaluate their
effectiveness across multiple datasets. Our findings reveal that while Qwen3
maintains competitive performance at moderate bit-widths, it experiences
notable degradation in linguistic tasks under ultra-low precision, underscoring
the persistent hurdles in LLM compression. These results emphasize the need for
further research to mitigate performance loss in extreme quantization
scenarios. We anticipate that this empirical analysis will provide actionable
insights for advancing quantization methods tailored to Qwen3 and future LLMs,
ultimately enhancing their practicality without compromising accuracy. Our
project is released on https://github.com/Efficient-ML/Qwen3-Quantization and
https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b.Summary
AI-Generated Summary