QWHA：面向大规模语言模型参数高效微调的量化感知Walsh-Hadamard自适应方法

摘要

对大型语言模型（LLMs）高效部署的需求，推动了量化技术以减少推理成本，以及参数高效微调（PEFT）以降低训练开销的研究热潮。这促使了量化感知PEFT的发展，旨在生成既精确又高效的量化模型。在此背景下，微调前减少量化误差对于实现高模型精度至关重要。然而，现有依赖低秩适应的方法存在表示能力有限的问题。近期基于傅里叶相关变换（FT）的适配器虽提供了比低秩适配器更强的表示能力，但直接将其整合到量化模型中往往导致误差减少效果不佳且计算开销增加。为克服这些局限，我们提出了QWHA方法，该方法通过采用沃尔什-哈达玛变换（WHT）作为变换核，并结合一种包含自适应参数选择与值优化的新型适配器初始化方案，将基于FT的适配器集成到量化模型中。我们证明，QWHA有效缓解了量化误差，促进了微调过程，并且其设计大幅降低了计算成本。实验结果显示，QWHA在低比特量化精度上持续超越基线方法，并在训练速度上相较于现有基于FT的适配器实现了显著提升。相关代码已发布于https://github.com/vantaa89/qwha。

English

The demand for efficient deployment of large language models (LLMs) has driven interest in quantization, which reduces inference cost, and parameter-efficient fine-tuning (PEFT), which lowers training overhead. This motivated the development of quantization-aware PEFT to produce accurate yet efficient quantized models. In this setting, reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy. However, existing methods that rely on low-rank adaptation suffer from limited representational capacity. Recent Fourier-related transform (FT)-based adapters offer greater representational power than low-rank adapters, but their direct integration into quantized models often results in ineffective error reduction and increased computational overhead. To overcome these limitations, we propose QWHA, a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together with a novel adapter initialization scheme incorporating adaptive parameter selection and value refinement. We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost. Experimental results show that QWHA consistently outperforms baselines in low-bit quantization accuracy and achieves significant training speedups over existing FT-based adapters. The code is available at https://github.com/vantaa89/qwha.

QWHA：面向大规模语言模型参数高效微调的量化感知Walsh-Hadamard自适应方法

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

摘要

Support