QWHA:面向大规模语言模型参数高效微调的量化感知Walsh-Hadamard自适应方法
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
September 22, 2025
作者: Hyesung Jeon, Seojune Lee, Beomseok Kang, Yulhwa Kim, Jae-Joon Kim
cs.AI
摘要
对大型语言模型(LLMs)高效部署的需求,推动了量化技术以减少推理成本,以及参数高效微调(PEFT)以降低训练开销的研究热潮。这促使了量化感知PEFT的发展,旨在生成既精确又高效的量化模型。在此背景下,微调前减少量化误差对于实现高模型精度至关重要。然而,现有依赖低秩适应的方法存在表示能力有限的问题。近期基于傅里叶相关变换(FT)的适配器虽提供了比低秩适配器更强的表示能力,但直接将其整合到量化模型中往往导致误差减少效果不佳且计算开销增加。为克服这些局限,我们提出了QWHA方法,该方法通过采用沃尔什-哈达玛变换(WHT)作为变换核,并结合一种包含自适应参数选择与值优化的新型适配器初始化方案,将基于FT的适配器集成到量化模型中。我们证明,QWHA有效缓解了量化误差,促进了微调过程,并且其设计大幅降低了计算成本。实验结果显示,QWHA在低比特量化精度上持续超越基线方法,并在训练速度上相较于现有基于FT的适配器实现了显著提升。相关代码已发布于https://github.com/vantaa89/qwha。
English
The demand for efficient deployment of large language models (LLMs) has
driven interest in quantization, which reduces inference cost, and
parameter-efficient fine-tuning (PEFT), which lowers training overhead. This
motivated the development of quantization-aware PEFT to produce accurate yet
efficient quantized models. In this setting, reducing quantization error prior
to fine-tuning is crucial for achieving high model accuracy. However, existing
methods that rely on low-rank adaptation suffer from limited representational
capacity. Recent Fourier-related transform (FT)-based adapters offer greater
representational power than low-rank adapters, but their direct integration
into quantized models often results in ineffective error reduction and
increased computational overhead. To overcome these limitations, we propose
QWHA, a method that integrates FT-based adapters into quantized models by
employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together
with a novel adapter initialization scheme incorporating adaptive parameter
selection and value refinement. We demonstrate that QWHA effectively mitigates
quantization errors while facilitating fine-tuning, and that its design
substantially reduces computational cost. Experimental results show that QWHA
consistently outperforms baselines in low-bit quantization accuracy and
achieves significant training speedups over existing FT-based adapters. The
code is available at https://github.com/vantaa89/qwha.