LoftQ：針對大型語言模型的 LoRA 微調感知量化

摘要

量化是為大型語言模型（LLMs）提供服務的不可或缺技術，最近也開始應用於 LoRA 微調。在本研究中，我們專注於對預先訓練模型同時應用量化和 LoRA 微調的情況。在這種情況下，通常會觀察到完全微調和量化加 LoRA 微調方法在下游任務表現之間存在一致的差距。為此，我們提出了LoftQ（LoRA微調感知量化），這是一個新穎的量化框架，同時對LLM進行量化並找到適當的低秩初始化值以進行LoRA微調。這種初始化有助於減輕量化模型和全精度模型之間的差異，並顯著改善下游任務的泛化性能。我們在自然語言理解、問答、摘要和自然語言生成任務上評估了我們的方法。實驗表明，我們的方法非常有效，在具有挑戰性的2位元和2/4位元混合精度範圍中特別優於現有的量化方法。我們將釋出我們的程式碼。

English

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves the generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.

LoftQ：針對大型語言模型的 LoRA 微調感知量化

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

摘要

Support