QWHA：量化感知的沃尔什-哈达玛适应方法面向大规模语言模型的高效参数微调

摘要

對大型語言模型（LLMs）高效部署的需求，激發了對量化技術的關注，該技術能降低推理成本，以及參數高效微調（PEFT），後者則能減少訓練開銷。這促進了量化感知PEFT的發展，旨在生成既精確又高效的量化模型。在此背景下，微調前減少量化誤差對於實現高模型精度至關重要。然而，現有依賴低秩適應的方法存在表示能力有限的問題。近期基於傅里葉相關變換（FT）的適配器提供了比低秩適配器更強的表示能力，但它們直接整合到量化模型中往往導致誤差減少效果不佳且計算開銷增加。為克服這些限制，我們提出了QWHA方法，該方法通過採用沃爾什-哈達瑪變換（WHT）作為變換核，並結合一種包含自適應參數選擇與值精煉的新型適配器初始化方案，將基於FT的適配器整合到量化模型中。我們證明，QWHA在促進微調的同時有效緩解了量化誤差，且其設計大幅降低了計算成本。實驗結果顯示，QWHA在低比特量化精度上持續超越基準方法，並在訓練速度上相較現有基於FT的適配器實現了顯著提升。相關代碼已公開於https://github.com/vantaa89/qwha。

English

The demand for efficient deployment of large language models (LLMs) has driven interest in quantization, which reduces inference cost, and parameter-efficient fine-tuning (PEFT), which lowers training overhead. This motivated the development of quantization-aware PEFT to produce accurate yet efficient quantized models. In this setting, reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy. However, existing methods that rely on low-rank adaptation suffer from limited representational capacity. Recent Fourier-related transform (FT)-based adapters offer greater representational power than low-rank adapters, but their direct integration into quantized models often results in ineffective error reduction and increased computational overhead. To overcome these limitations, we propose QWHA, a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together with a novel adapter initialization scheme incorporating adaptive parameter selection and value refinement. We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost. Experimental results show that QWHA consistently outperforms baselines in low-bit quantization accuracy and achieves significant training speedups over existing FT-based adapters. The code is available at https://github.com/vantaa89/qwha.

QWHA：量化感知的沃尔什-哈达玛适应方法面向大规模语言模型的高效参数微调

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

摘要

Support