Qwen3量化技术的实证研究
An Empirical Study of Qwen3 Quantization
May 4, 2025
作者: Xingyu Zheng, Yuye Li, Haoran Chu, Yue Feng, Xudong Ma, Jie Luo, Jinyang Guo, Haotong Qin, Michele Magno, Xianglong Liu
cs.AI
摘要
Qwen系列已成为开源大型语言模型(LLM)领域的领军家族,在自然语言理解任务中展现出卓越能力。随着最新发布的Qwen3在多项基准测试中表现优异,如何在资源受限环境中高效部署这些模型引发了广泛关注。低比特量化作为一种颇具前景的解决方案,其对Qwen3性能的影响尚未得到充分探索。本研究系统评估了Qwen3在不同量化设置下的鲁棒性,旨在揭示压缩这一尖端模型过程中的机遇与挑战。我们严格评估了应用于Qwen3的5种现有经典训练后量化技术,涵盖1至8比特的位宽,并在多个数据集上检验其有效性。研究发现,尽管Qwen3在中等位宽下保持了竞争力,但在超低精度下语言任务性能显著下降,凸显了LLM压缩领域持续存在的难题。这些结果强调了在极端量化场景下减少性能损失方面进一步研究的必要性。我们预期,这一实证分析将为针对Qwen3及未来LLM的量化方法改进提供可操作的洞见,最终在不牺牲准确性的前提下提升其实用性。本项目已发布于https://github.com/Efficient-ML/Qwen3-Quantization 和 https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b。
English
The Qwen series has emerged as a leading family of open-source Large Language
Models (LLMs), demonstrating remarkable capabilities in natural language
understanding tasks. With the recent release of Qwen3, which exhibits superior
performance across diverse benchmarks, there is growing interest in deploying
these models efficiently in resource-constrained environments. Low-bit
quantization presents a promising solution, yet its impact on Qwen3's
performance remains underexplored. This study conducts a systematic evaluation
of Qwen3's robustness under various quantization settings, aiming to uncover
both opportunities and challenges in compressing this state-of-the-art model.
We rigorously assess 5 existing classic post-training quantization techniques
applied to Qwen3, spanning bit-widths from 1 to 8 bits, and evaluate their
effectiveness across multiple datasets. Our findings reveal that while Qwen3
maintains competitive performance at moderate bit-widths, it experiences
notable degradation in linguistic tasks under ultra-low precision, underscoring
the persistent hurdles in LLM compression. These results emphasize the need for
further research to mitigate performance loss in extreme quantization
scenarios. We anticipate that this empirical analysis will provide actionable
insights for advancing quantization methods tailored to Qwen3 and future LLMs,
ultimately enhancing their practicality without compromising accuracy. Our
project is released on https://github.com/Efficient-ML/Qwen3-Quantization and
https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b.Summary
AI-Generated Summary