Qwen3量化技术的实证研究

摘要

Qwen系列已成为开源大型语言模型（LLM）领域的领军家族，在自然语言理解任务中展现出卓越能力。随着最新发布的Qwen3在多项基准测试中表现优异，如何在资源受限环境中高效部署这些模型引发了广泛关注。低比特量化作为一种颇具前景的解决方案，其对Qwen3性能的影响尚未得到充分探索。本研究系统评估了Qwen3在不同量化设置下的鲁棒性，旨在揭示压缩这一尖端模型过程中的机遇与挑战。我们严格评估了应用于Qwen3的5种现有经典训练后量化技术，涵盖1至8比特的位宽，并在多个数据集上检验其有效性。研究发现，尽管Qwen3在中等位宽下保持了竞争力，但在超低精度下语言任务性能显著下降，凸显了LLM压缩领域持续存在的难题。这些结果强调了在极端量化场景下减少性能损失方面进一步研究的必要性。我们预期，这一实证分析将为针对Qwen3及未来LLM的量化方法改进提供可操作的洞见，最终在不牺牲准确性的前提下提升其实用性。本项目已发布于https://github.com/Efficient-ML/Qwen3-Quantization 和 https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b。

English

The Qwen series has emerged as a leading family of open-source Large Language Models (LLMs), demonstrating remarkable capabilities in natural language understanding tasks. With the recent release of Qwen3, which exhibits superior performance across diverse benchmarks, there is growing interest in deploying these models efficiently in resource-constrained environments. Low-bit quantization presents a promising solution, yet its impact on Qwen3's performance remains underexplored. This study conducts a systematic evaluation of Qwen3's robustness under various quantization settings, aiming to uncover both opportunities and challenges in compressing this state-of-the-art model. We rigorously assess 5 existing classic post-training quantization techniques applied to Qwen3, spanning bit-widths from 1 to 8 bits, and evaluate their effectiveness across multiple datasets. Our findings reveal that while Qwen3 maintains competitive performance at moderate bit-widths, it experiences notable degradation in linguistic tasks under ultra-low precision, underscoring the persistent hurdles in LLM compression. These results emphasize the need for further research to mitigate performance loss in extreme quantization scenarios. We anticipate that this empirical analysis will provide actionable insights for advancing quantization methods tailored to Qwen3 and future LLMs, ultimately enhancing their practicality without compromising accuracy. Our project is released on https://github.com/Efficient-ML/Qwen3-Quantization and https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b.

Qwen3量化技术的实证研究

An Empirical Study of Qwen3 Quantization

摘要

Support