RandLoRA：大型模型的全秩參數高效微調

摘要

低秩適應（LoRA）及其變體在減少大型變壓器網絡的可訓練參數數量和內存需求方面取得了令人印象深刻的成果，同時保持了微調性能。然而，權重更新的低秩特性固有地限制了微調模型的表示能力，可能影響在複雜任務上的性能。這帶出了一個關鍵問題：當觀察到LoRA和標準微調之間的性能差距時，是由於可訓練參數數量減少還是秩缺陷？本文旨在通過引入RandLoRA來回答這個問題，這是一種參數高效的方法，使用學習的低秩、不可訓練隨機矩陣的線性組合執行全秩更新。我們的方法通過將優化限制在應用於固定隨機矩陣的對角縮放矩陣，限制了可訓練參數的數量，這使我們能夠在訓練過程中有效地克服低秩限制，同時保持參數和內存效率。通過在視覺、語言和視覺-語言基準測試中進行廣泛實驗，我們系統地評估了LoRA和現有隨機基礎方法的限制。我們的研究結果顯示，全秩更新在視覺和語言任務中分別是有益的，對於視覺-語言任務更是如此，其中RandLoRA顯著減少了標準微調和LoRA之間的性能差距，有時甚至消除了，展示了其有效性。

English

Low-Rank Adaptation (LoRA) and its variants have shown impressive results in reducing the number of trainable parameters and memory requirements of large transformer networks while maintaining fine-tuning performance. However, the low-rank nature of the weight update inherently limits the representation power of fine-tuned models, potentially compromising performance on complex tasks. This raises a critical question: when a performance gap between LoRA and standard fine-tuning is observed, is it due to the reduced number of trainable parameters or the rank deficiency? This paper aims to answer this question by introducing RandLoRA, a parameter-efficient method that performs full-rank updates using a learned linear combinations of low-rank, non-trainable random matrices. Our method limits the number of trainable parameters by restricting optimization to diagonal scaling matrices applied to the fixed random matrices. This allows us to effectively overcome the low-rank limitations while maintaining parameter and memory efficiency during training. Through extensive experimentation across vision, language, and vision-language benchmarks, we systematically evaluate the limitations of LoRA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where RandLoRA significantly reduces -- and sometimes eliminates -- the performance gap between standard fine-tuning and LoRA, demonstrating its efficacy.

RandLoRA：大型模型的全秩參數高效微調

RandLoRA: Full-rank parameter-efficient fine-tuning of large models

摘要

Support