QWHA:量化感知的沃尔什-哈达玛适应方法 面向大规模语言模型的高效参数微调
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
September 22, 2025
作者: Hyesung Jeon, Seojune Lee, Beomseok Kang, Yulhwa Kim, Jae-Joon Kim
cs.AI
摘要
對大型語言模型(LLMs)高效部署的需求,激發了對量化技術的關注,該技術能降低推理成本,以及參數高效微調(PEFT),後者則能減少訓練開銷。這促進了量化感知PEFT的發展,旨在生成既精確又高效的量化模型。在此背景下,微調前減少量化誤差對於實現高模型精度至關重要。然而,現有依賴低秩適應的方法存在表示能力有限的問題。近期基於傅里葉相關變換(FT)的適配器提供了比低秩適配器更強的表示能力,但它們直接整合到量化模型中往往導致誤差減少效果不佳且計算開銷增加。為克服這些限制,我們提出了QWHA方法,該方法通過採用沃爾什-哈達瑪變換(WHT)作為變換核,並結合一種包含自適應參數選擇與值精煉的新型適配器初始化方案,將基於FT的適配器整合到量化模型中。我們證明,QWHA在促進微調的同時有效緩解了量化誤差,且其設計大幅降低了計算成本。實驗結果顯示,QWHA在低比特量化精度上持續超越基準方法,並在訓練速度上相較現有基於FT的適配器實現了顯著提升。相關代碼已公開於https://github.com/vantaa89/qwha。
English
The demand for efficient deployment of large language models (LLMs) has
driven interest in quantization, which reduces inference cost, and
parameter-efficient fine-tuning (PEFT), which lowers training overhead. This
motivated the development of quantization-aware PEFT to produce accurate yet
efficient quantized models. In this setting, reducing quantization error prior
to fine-tuning is crucial for achieving high model accuracy. However, existing
methods that rely on low-rank adaptation suffer from limited representational
capacity. Recent Fourier-related transform (FT)-based adapters offer greater
representational power than low-rank adapters, but their direct integration
into quantized models often results in ineffective error reduction and
increased computational overhead. To overcome these limitations, we propose
QWHA, a method that integrates FT-based adapters into quantized models by
employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together
with a novel adapter initialization scheme incorporating adaptive parameter
selection and value refinement. We demonstrate that QWHA effectively mitigates
quantization errors while facilitating fine-tuning, and that its design
substantially reduces computational cost. Experimental results show that QWHA
consistently outperforms baselines in low-bit quantization accuracy and
achieves significant training speedups over existing FT-based adapters. The
code is available at https://github.com/vantaa89/qwha.